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ABSTRACT 

Efficient  processing  of  views  is  critical  to  many  real 
world  applications,  particularly  real  time  applications  such 
as  surveillance  systems  which  support  military  applications. 
This  thesis  compares  the  performance  of  three  view 
materialization  strategies:  semi-materialization,  full 
materialization  and  query  modification.  This  thesis  first 
develops  a  program  that  generates  databases  according  to  user 
specification.  Second  the  generated  databases  are  used  to 
conduct  an  empirical  study  on  the  three  view  materialization 
strategies  using  select-project-join  and  general  expression 
views.  The  results  of  the  study  indicate  that  for  select- 
project-join  view  definitions,  semi-materialization  performed 
best  for  higher  values  of  P,  lower  values  of  1,  fv  and  all 
values  of  fq  with  the  database  stored  on  hard  disk.  Full 
materialization  performed  best  for  lower  values  of  P,  I,  and 
all  values  of  fv  with  the  database  stored  in  RAM.  The  results 
also  indicate  that  the  semi-materialization  strategy  is  the 
best  view  processing  method  for  general  expressions. 
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I .   INTRODUCTION 

A .   BACKGROUND 

A  database  is  a  computer  based  record  keeping  system  that 
contains  information  used  to  support  an  organization's 
tactical  (short  range)  and  strategic  (long  range)  goals.  For 
example,  a  database  for  a  sales  organization  could  contain 
customer,  employee,  sales  and  inventory  data. 

Several  data  models  are  available  to  organize  the 
information  within  the  database  so  that  it  can  be  utilized  in 
an  efficient  manner.  One  of  the  most  common  data  models  is  the 
relational  model.  This  method  organizes  data  in  terms  of 
tables  (relations),  rows  (tuples)  and  columns  (attributes). 

Tables  can  be  classified  as  either  base  tables  or  views. 
A  base  table  is  a  table  that  physically  exists  in  its  own 
right.  A  view  maybe  thought  of  as  a  virtual  table,  in  as  much, 
that  it  does  not  (normally)  exist  within  its  own  right  but  is 
instead  derived  from  one  or  more  underlying  base  tables 
[Ref.  1].  The  view  is  stored  as  a  definition  in  the  data 
dictionary  and  is  combined  with  a  user's  query  to  retrieve  the 
requested  data  from  the  base  tables. 


The  use  of  views  allows  for  the  structuring  and  limiting  of 
the  information  retrieved  by  a  given  query.  This  feature 
allows  the  user  to  receive  data  that  is  relevant  to  the 
application  and  limits  unauthorized  user  access  to  other 
critical  data. 

Recently  several  proposals  have  considered  storing  some 
form  of  the  processed  view  to  eliminate  the  need  to  evaluate 
the  view  definition  from  scratch  every  time  it  is  queried.  The 
first  approach,  known  as  full  materialization,  stores  the 
fully  processed  view  as  a  physical  table.  This  approach  has 
the  advantage  of  increasing  the  efficiency  of  the  queries  on 
the  view  ,  but  incurs  an  additional  expense  of  maintaining  the 
materialized  view.  To  overcome  this  problem,  a  second 
approach,  called  semi-materialization,  was  proposed  whereby  a 
partially  processed  rather  than  a  fully  materialized  view  is 
stored.  This  approach  redundantly  stores  data  that  represents 
selections  and  projections  of  individual  relations,  thus 
allowing  efficient  evaluation  of  the  view  definition  while 
being  easy  to  maintain. 

View  performance  processing  is  directly  related  to  the 
performance  of  real  time  applications  such  as  surveillance 
systems  which  support  military  operations.  These  systems 
receive  periodic  environmental  updates  from  various  sensors 
which  are  used  to  evaluate  a  view.  Any  delay  processing  the 
sensor  data,  which  is  typically  time  sensitive,  into  usable 
information  could  render  the  information  late  and  unusable. 


Faster  view  processing  used  in  conjunction  with  real  time 
systems  will  significantly  improve  the  response  time  of  these 
systems. 

B.   OBJECTIVE 

The  objective  of  this  thesis  is  to  compare  empirically  the 
performance  of  three  view  processing  strategies:  query 
modification,  semi-materialization  and  full  materialization. 
The  research  attempts  to  verify  the  analytical  results  which 
have  indicated  that,  in  general,  the  semi-materialization 
strategy  is  the  best  method  for  processing  general  expression 
views  [Ref.  2].  To  accomplish  this  goal,  this  research 
develops  a  Data  Generation  Program  to  produce  test  databases 
according  to  user  specifications.  The  test  databases  are  then 
used  to  compare  the  performance  of  three  view  processing 
strategies  for  two  view  expressions  and  under  different 
parameter  settings,  using  a  simulation  program  that  was 
developed  by  Lt  Jesse  South  [Ref.  3],  Performance 
results  were  then  collected,  analyzed  and  plotted  for 
presentation  in  this  thesis. 


C.  SCOPE  AND  METHODOLOGY 

This  thesis  accomplishes  the  following: 

1 .  Develops  a  generic  database  generating  program  using 
ANSI  C  to  generate  test  databases  according  to  user 
specifications . 

2.  Compares  the  performance  of  three  view  materialization 
strategies  for  select-project-join  expressions  with  the 
database  stored  in  Random  Access  Memory  (RAM)  and  hard 
disk. 

3.  Tests  the  three  view  strategies  using  general 
expressions  with  the  database  stored  on  RAM  and  on 

hard  disk  under  different  parameter  settings,  collecting 
the  results  and  comparing  them  with  analytical  results. 

4.  Uses  the  results  to  draw  conclusions  and  determine  the 
conditions  under  which  each  strategy  performs  the  best. 

D.  ORGANIZATION  OF  STUDY 

This  thesis  is  organized  as  follows.  Chapter  II  overviews 
the  three  view  processing  strategies.  Chapter  III  provides  a 
detailed  description  of  the  Data  Generation  Program.  Chapter 

IV  presents  the  performance  results  of  the  empirical  study  and 
compares  them  to  the  results  of  the  analytical  study.  Chapter 

V  presents  conclusions  based  on  the  study  and  suggests  areas 
for  future  research. 


II.   VIEW  MATERIALIZATION  STRATEGIES 

The  purpose  of  this  chapter  is  to  provide  a  general 
overview  of  the  three  view  materialization  strategies  -  query 
modification,   semi-materialization  and  full 
materialization . 

A.   QUERY  MODIFICATION 

The  conventional  method  for  view  processing  for  queries  is 
query  modification.  This  method  stores  a  view  definition  in 
the  data  dictionary.  This  view  definition  is  retrieved  from 
the  data  dictionary  when  a  query  is  issued  on  the  view  and 
combined  with  the  user  query  into  an  equivalent  query  on  the 
underlying  base  tables.  This  query  is  subsequently  processed, 
and  the  results  returned  to  its  user.  Consider  the  following 
database  schema: 

EMP ( E# , ENAME , ADDRESS , SALARY, TITLE ) 

POS(E#,S#, LEVEL) 
and  the  corresponding  view  definition  COMBATSTAFF: 

lie .  ENUM,  e .  ENAME,  e .  SALARY 
(ap.LEVEL>3  (EMP>+POS)  ) 

Now  when  a  query  is  issued  against  COMBATSTAFF: 

lie.  ENUM,  c.  ENAME (o C.  S 'ALARY> 30,  000  {COMBATSTAFF)  ) 


The  view  mechanism  translates  the  query  into  the  equivalent 
query  on  the  base  relations: 

lie .  ENUM,  8 .  ENAME 
(oe.  salary> 30,0 0 0 Ap .  LEVEL> 3  ( EMPxPOS)  ) 

The  resulting  query  is  optimized  to  determine  the  best  access 
path  and  then  executed. 

B.   FULL  MATERIALIZATION 

This  method  creates  an  actual  table  based  on  the  view 
definition.  The  resulting  table  is  used  to  perform  user 
queries,  thus  avoiding  the  cost  of  repeatedly  retrieving  a 
view  definition  and  creating  equivalent  queries  on  the  base 
relations.  This  method  works  quite  well  for  processing 
queries,  but  is  costly  when  the  frequency  of  update  is  high, 
since  the  full  materialized  view  must  be  maintained. 

Updates  are  defined  as  a  transaction  which  performs  a 
sequence  of  tuple  insertions,  tuple  deletions,  and  tuple 
modifications  on  a  relation(s).  Suppose  that  a  set  of  tuples 
A  is  added  to  a  relation  and  a  set  of  tuples  D  is  deleted  from 
the  same  relation.  The  tuple  sets  A  and  D  represent  the  net 
change  made  to  that  relation.  In  that  case,  a  tuple  which  is 
inserted  and  deleted  in  the  same  transaction  would  not  appear 
in  either  tuple  set  A  or  D. 


Using  this  method,  the  net  results  of  an  update 
transaction  could  be  used  as  a  basis  for  a  differential 
algorithm  to  update  the  materialized  view. 

In  fact  this  method  works  quite  well  using  select-project- 
join  expressions  because  selections  and  projections  can  be 
performed  over  unions.  Using  the  view  definition  in  the 
previous  section  and  limiting  the  updates  to  the  POS  relation 
for  simplicity,  the  view  expression  becomes: 

COMBATSTAFF'=COMBATSTAFF-Tle .  ENUM,  e .  ENAME,  e .  SAIARY 

(p.LEVEL>3  (D^POS)  ) 
[Me .  ENUM,  e .  ENAME,  e .  SALARY (p .  LEVEL>  3  {Af+POS)  ) 

The  above  expression  shows  that  the  fully  materialized  view 
can  be  maintained  by  computing  the  last  two  expressions  and 
inserting  them  into  or  deleting  them  from  the  materialized 
view  COMBATSTAFF. 

Unfortunately  a  similar  expression  can  not  be  derived  if 
a  general  expression  is  used  in  the  view  definition.  At 
present  no  efficient  differential  algorithm  exists  for 
performing  incremental  updates  for  general  expressions.  This 
fact  necessitates  that  a  complete  re-evaluation  of  the  view 
expression  be  accomplished  after  each  update  to  the  base 
relations.  The  cost  of  re-evaluating  a  fully  materialized  view 
can  be  prohibitive  as  the  frequency  of  updates  for  the  base 
relations  increase,  which  is  the  chief  problem  associated  with 
this  method. 


C.   SEMI -MATERIALIZATION 

This  method  stores  redundant  subsets  of  carefully  chosen 
data  from  individual  base  tables.  These  redundant  subsets  are 
stored  as  actual  tables  and  represent  an  intermediate  state  of 
computing  the  view.  Each  subset  is  a  projection  and  selection 
of  the  base  table(s)  thus  making  the  construction  of  the  view 
less  costly  than  using  the  base  relations. 

The  redundant  data  is  clustered  on  the  join  attribute(s) 
which  allows  for  the  efficient  construction  of  the  view. 
Updates  to  the  base  relations  are  screened  to  determine  if  the 
update  affects  the  redundant  tables.  If  it  does,  it  is 
inserted  into  or  deleted  from  the  appropriate  redundant 
tables. 

The  following  redundant  subsets  would  be  stored  to  support 
this  technique: 

EMP'=He .  ENUM,  e .  ENAME,  e .  SALARY  ( EMP) 
POS;=Up .  ENUM ( op .  LEVEL>  3  ( POS)  ) 

This  view  is  combined  with  a  user  query  to  form  an  equivalent 
query  on  the  redundant  relations: 

lie.  ENUM,  C.  ENAME,  C.  SALARY  (EMP'^POS') 


When  queried  the  following  equivalent  view  is  created  using 
the  redundant  tables  and  the  view  definition: 

lie' .  ENUM,  e' .  ENAME(e' .  SALARY> 30 ,  000  (EMPf*+POS')  ) 

This  method  becomes  more  complicated  as  additional 
insertions  and  deletions  occur.  Since  more  than  one  base 
relation  may  have  been  the  source  of  the  tuples  used  in  the 
materialized  view,  it  becomes  increasing  difficult  to 
determine,  when  or  if  a  record  should  be  removed  from  the 
view. 

To  alleviate  this  problem  each  materialized  view  must  keep 
a  duplicate  count  of  the  number  of  tuples  contributed,  by  each 
redundant  subset,  to  the  tuples  in  the  materialized  view  when 
the  subsets  are  joined.  The  count  should  be  incremented  or 
decremented  depending  on  the  transaction  until  the  count 
becomes  zero . 


III.   DATA  GENERATION  PROGRAM 

The  purpose  of  this  chapter  is  to  describe  the  Data 
Generation  program.  According  to  user  specifications  the 
program  generates  text  files  that  are  used  subsequently  to 
build  the  test  database.  As  shown  in  Figure  1,  the  program 
reads  control  information  from  a  text  file  created  by  a  user 
or  generated  by  the  simulation  program  and  generates  the 
specified  text  files.  The  program  allows  the  user  to  control 
the  number  of  records  (cardinality  of  the  relation),  the  data 
type  (ALPHA,  NUMERIC  or  ALPHANUMERIC  characters),  the  size  of 
each  field  and  the  number  of  fields  generated  for  each  record. 
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Figure  1 :   Data   generation   program   data 
overview. 
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The  process  to  generate  the  data  is  hidden  from  the  user 
by  using  a  fixed  format  control  file  as  the  user  interface  to 
the  program. 

The  program  is  written  in  ANSI  C  to  increase  portability 
of  the  source  code  and  to  minimize  the  changes  necessary  to 
transfer  the  program  to  a  mini  or  mainframe  environment.  The 
maximum  size  of  the  text  file  generated  by  the  program  is 
limited  only  by  the  secondary  storage  available  on  the 
platform  in  use. 

A.   GENERAL  DESCRIPTION 

The  Data  Generation  program  receives  control  data  from  the 
text  file  "DATA_IN" .  The  information  in  the  control  data  file 
is  effectively  divided  into  two  sections.  The  first  section 
determines  the  number  of  records,  fields  per  record  and  the 
name  of  the  output  file.  The  second  section  defines  each  field 
within  the  record  by  type  of  information  for  the  field  (Field 
Type: Alpha,  Numeric  or  Alphanumeric):  the  number  of  characters 
for  each  attribute  (Field  Width):  the  upper  and  lower  bounds 
for  any  arrays  and  the  incremental  value  used  for  counters. 
Data  Generation  program  reads  the  data  into  a  set  of  linked 
lists  which  are  passed  to  the  control  modules  by  the  main 
module  to  create  each  record. 
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B.   REQUIREMENTS 

The  requirement  for  the  Data  Generation  program  was  based 
on  a  user  request  that  a  new  generic  data  generating  program 
be  written  in  the  C  programming  language  to  replace  the 
previous  database  generating  program. 


SAMPLE  INPUT  RLE 


300  5  EMP  DATA 


A.  RECORD  STRUCTURE 


1.  NUMBER  OF 
RECORDS 

2.  NUMBER  OF 

FIELDS 

3.  TEXT  RLE 
NAME 


A  5  R  D  1  T 


B.   RELD  STRUCTURE 

1.  FIELD  TYPE 
A:  ALPHA 
N:  NUMERIC 
a  ALPHA- 

4. ARRAY 
LOWER 
BOUND 

NUMERIC 

2.  FIELD  WIDTH 

5.  INCREMENT 

3.  FIELD  INFO 
S:  SEQUENTIAL 
B:  ARRAY 
R:  RANDOM 

6.  ARRAY 
UPPER 
BOUND 

D:  DEFAULT 

Figure  2 :  Data  generation  program  parameters  and  parameter 
definitions. 
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The  program  accepts  the  following  inputs  and  generates  a 
text  file  used  to  create  test  databases: 

1.  Number  of  text  files  required. 

2.  Number  of  records  per  text  file. 

3.  Name  of  the  text  file. 

4.  Number  of  fields  per  record. 

5.  Size  of  each  field. 

6.  Type  of  information  in  each  field. 

7.  Number  of  distinct  values  in  each  field. 

8.  Upper  and  lower  limits  for  the  fields. 

9.  Input  reference  for  randomly  generated  characters. 

To  simplify  the  performance  analysis  several 
assumptions  were  made  about  the  data  generated  for  the  test 
database.  The  first  assumption  was  that  the  values  for  each 
field  in  the  column  were  uniformly  distributed  over  the  range 
of  values  in  the  column.  The  second  assumption  considered  each 
value  in  a  given  column  to  be  independent  of  the  values  in  the 
other  columns. 

C.   NOTES  ON  PROGRAM  DESIGN 

The  requirement  for  maximum  program  flexibility  dictated 
a  "layered"  design  approach  be  used,  creating  individual 
primitive  modules  to  produce  the  varied  types  of  output  data 
requested  by  the  user. 
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To  keep  the  coupling  between  the  modules  as  loose  as 
possible,  the  use  of  global  variables  is  minimized  and  when 
feasible,  only  a  single  record  structure  is  passed  between 
the  called  and  calling  modules. 


MAIN  MODULE 
PASSES  CONTROL 
&  RECORD 
STRUCTURE  1 


INPUTS 

7.  FIELD  STRUCTURE 

(LINK  LIST) 

8.  INCREMENT  AND 
BOUNDS 


TYPE 
NUMERIC 
CALLS 
PRIMITIVE 


TYPE 
ALPHA 
CALLS 
PRIMITIVE 


TYPE 
ALPHA  - 
NUMERIC 
PRINTS 
OUTPUT 


u 


PRIMITIVE  MODULES        i.« 

GENERATE  AND  PRINT 
OUTPUT  TO  TEXT  FILES 


PRIMITIVE  MODULES  1.2.1 

GENERATE  AND  PRINT 
OUTPUT  TO  TEXT  FILES 


Figure  3:  Data  generation  program  control  module 
overview. 


Each  primitive  module  prints  its  output  directly  to  the 
output  text  file  with  the  exception  of  the 
generatenumericarray  module  which  returns  its  numeric  output 
to  the  random_generator  module  for  conversion  to  alpha 
characters,  if  required. 
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This  method  was  chosen  after  trial  and  error  as  the  best 
method  for  facilitating  the  tracing  of  data  and  control  flow 
through  the  modules. 

The  rand( )  C  library  function  was  used  to  generate  random 
data.  Two  C  language  record  structures  were  used  to  establish 
the  command  language  between  the  data  generation  program,  the 
control  file  and  the  view  materialization  simulation  program. 

D.   PROGRAM  MODULE  OVERVIEW 

A  brief  description  of  each  module  is  provided  to  clarify 
the  control  and  data  flows  that  are  described  in  Section  E. 

1 .  Main  Module 

The  main  module  opens  and  closes  the  input-output 
files,  loads  the  control  data  into  the  record  structures  and 
directs  the  flow  of  the  control  data  to  the  applicable  modules 
for  data  generation. 

2.  Time  Hack  Module 

The  time  hack  module  uses  the  system  clock  to  compute 
the  base  reference  for  the  generation  of  random  alpha  and 
numeric  output  values. 

3.  Type_Numeric  Module 

The  type_numeric  module  is  called  by  the  main  module 
to  generate  a  numeric  string,  or  call  the  sequential  counter, 
randomgenerator  or  the  bounded_sequential_array  modules. 
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4.  Type_Alpha  Module 

The  typealpha  module  is  one  of  three  process  control 
modules  used  to  determine  the  type  of  characters  in  a  field. 
The  module  receives  its  input  in  the  form  of  a  record 
structure  passed  from  the  main  module  to  generate  a  string  of 
alpha  characters,  or  to  call  the  random_generator  or  the 
bounded_sequential_array  modules . 

5 .  Type_Alphanumeric  Module 

The  type_alphanumeric  module  is  the  last  process 
control  module  and  generates  a  single  variable  length  string 
of  alpha  and  numeric  characters  when  called  by  the  main 
module. 

6.  Bounded_Sequential_Array  Module 

The  bounded_sequential_array  module,  which  is  called 
by  either  the  type_alpha  or  typenumeric  modules,  receives 
three  numeric  values  from  the  calling  module.  The  values 
determine  the  array  lower  bound,  the  number  of  array  elements 
and  the  incremental  value  of  each  element.  The  rand( )  function 
is  used  to  generate  a  random  index  number  to  select  the  array 
element  value  that  is  printed  in  the  output  file. 
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7.  Random_Generator  Module 

The  random_generator  module  is  called  in  the  same 
manner  as  the  bounded_sequential_array  module.  The  module 
determines  if  a  character  or  numeric  value  is  required,  calls 
the  generatenumericarray  module  to  produce  the  required 
value  and  prints  the  value  or  character  in  the  output  file. 

8.  Random_Long_Array  Module 

The  random_long_array  module  is  called  by  the 
type_numeric  module  to  produce  a  random  numeric  output 
employing  the  same  rand(  )  function  that  was  used  in  the 
boundedsequentialarray  module.  The  module  computes  the  array 
size  and  determines  if  the  number  of  array  elements  exceeds  a 
preset  limit. 

The  module  will  compute  the  output  value  using  the 
upper  bound  value  and  the  rand( )  function  to  conserve  main 
memory  rather  than  allocating  space  for  the  array  if  the 
preset  limit  is  exceeded.  This  method  was  used  to  prevent  the 
program  from  using  memory  unnecessarily. 

9.  Counter  Module 

The  counter  module  is  called  by  the  type_numeric 
module  and  uses  global  values  to  generate  up  to  three 
independent  sequential  counters. 
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10.  Generate_Numeric_Array  Module 

The  generate_numeric_array  module  is  called  by  the 
random  generator  module  to  produce  a  second  independent 
bounded  array  similar  to  the  bounded_sequential_array  module 
except  the  random  numeric  output  from  the  module  is  returned 
to  the  calling  module  for  possible  conversion  to  an  alpha 
character. 

11.  Print  Modules 

The  print  modules  are  all  used  to  send  debugging  data 
to  a  text  file  called  "output.txt"  that  is  controlled  by  a 
toggle  called  "TROUBLESHOOTING". 

E.   DETAILED  DATA  AND  CONTROL  FLOW 

The  Data  Generation  Program  is  called  by  a  batch  file 
which  reads  the  control  file  "DATA_IN".  The  input  data  is 
formatted  to  conform  to  the  two  record  structures  declared  in 
the  definition  section  of  the  program. 

Once  the  input  data  is  loaded  into  the  program,  the 
control  file  is  closed  and  the  output  file  is  opened.  The 
output  file  name  is  part  of  the  control  file  data.  Each  record 
structure  is  read  and  control  is  routed  to  the  appropriate 
control  module  based  on  field  type  (  ALPHA  "A",  NUMERIC  "N" 
and  ALPHANUMERIC  "0"). 


18 


The  type_numeric  module  will  be  used  to  trace  the  first 
data  flow  through  the  modules,  the  second  data  flow  be  traced 
using  the  typealpha  module  and  the  last  data  flow  will  use 
type_alphanumeric  module. 

"N"  is  the  field  type  read  by  the  main  module  in  the 
attribute  record  structure.  Control  and  the  attribute  record 
structure  is  passed  to  the  type_numeric  module  by  the  main 
module.  The  attribute  record  structure  is  read  by  the  module 
to  determine  the  field  information  (BOUNDED  SEQUENTIAL  ARRAY 
"B",  RANDOM  GENERATOR  "R" ,  COUNTER  "S" ,  RANDOM  LONG  ARRAY  "X" 
or  DEFAULT  "D" ) . 

The  field  information  type  read  by  the  module  is  "B"  and 
the  bounded_sequential_array  module  is  called.  The 
type_numeric  module  converts  the  lower  and  upper  bound 
character  strings  to  numeric  values  which  are  passed  to  the 
bounded_sequential_array  module  along  with  the  incremental 
data.  The  module  uses  the  input  data  to  determine  array  size, 
the  lower  bound  and  increment. 

Memory  is  allocated  and  the  array  is  filled.  The  rand( ) 
function  and  the  array  size  are  used  to  compute  an  random 
index  number  to  select  the  array  element  value  to  be  printed 
in  the  applicable  field  in  the  output  field. 
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Control  is  returned  to  the  main  module  and  the  next 
attribute  record  structure  is  read.  "A"  is  the  next  field  type 
read  by  the  main  module:  control  and  the  attribute  record 
structure  is  passed  to  the  type_alpha  module. 

The  type_alpha  module  reads  the  attribute  record 
structure.  "R"  is  the  field  information  read  by  the  module. 
The  randomgenerator  module  is  called,  the  lower  bound 
character  and  upper  bound  character  strings  are  read  by  the 
typealpha  module.  The  character  strings  are  converted  to 
numeric  values  and  passed  along  with  the  incremental  data  to 
the  random_generator  module. 

The  random_generator  module  determines  if  the  integer 
values  represent  alpha  characters  or  numeric  values.  In  this 
case,  the  values  represent  the  upper  case  letters  "A" (lower 
bound),  "R"  (upper  bound),  and  the  increment  value  of  1.  The 
random_generator  computes  the  array  size,  and  passes  the 
values  to  the  generate_numeric_array  module  to  generate  the 
array. 

The  generate_numeric_array  module  allocates  and  fills  the 
array.  The  rand( )  function  is  used  to  select  an  array  value 
which  is  returned  to  the  randomgenerator  module.  The  value  is 
converted  to  an  alpha  character  in  the  random_generator  module 
and  printed  in  the  applicable  field  in  the  output  file. 
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Control  is  returned  to  the  main  module  and  the  next 
attribute  record  is  read.  "0"  is  the  next  field  type  read  by 
the  main  module:  control  and  the  attribute  record  structure  is 
passed  to  the  type_alphanumeric  module. 

Unlike  the  other  control  modules  the  typealphanumeric 
module  does  not  call  other  modules.  The  attribute  record 
structure  is  read  to  determine  the  total  number  of  alpha  and 
numeric  characters  required.  Total  field  width  is  the 
aggregate  of  the  two  character  strings. 

The  characters  are  generated  sequentially  "A  -  Z"  for  the 
alpha  string  and  "0  -  9"  for  the  numeric  string.  The 
characters  are  printed  to  the  output  file  one  at  a  time  until 
the  field  is  completed. 

The  process  for  the  other  field  information  types  is 
similar  for  both  the  type_alpha  and  typenumeric  modules. 
ERROR  handling  is  limited  to  verification  of  the  input  data 
and  the  opening  of  the  required  input  and  output  files. 

F.   TESTING 

Testing  was  conducted  on  each  module  when  it  was  created 
or  updated.  Small  text  files  which  simulated  the  input  data 
for  the  particular  module  being  tested  was  modified  to  test 
each  module  over  a  wide  range  of  values.  The  entire  program 
was  tested  using  a  variety  of  control  files  to  create  text 
files  from  50  to  over  60,000  records  with  at  least  10 
attribute  fields  per  record. 
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IV.   PERFORMANCE  ANALYSIS 

The  purpose  of  this  chapter  is  to  describe  and  report  the 
results  of  the  empirical  study  conducted  on  the  three  view 
materialization  strategies  --  query  modification,  full 
materialization  and  semi-materialization  --  using  select- 
project-join  (Model  1)  and  general  expressions  (Model  2). 
Performance  testing  was  conducted  on  databases  stored  in 
Random  Access  Memory  (RAM)  and  on  a  hard  disk  using  a  computer 
with  an  INTEL  80386SX  processor  running  at  20  MHz.  The 
simulation  program  is  written  in  ANSI  C  with  embedded  SQL 
commands  to  access  the  INGRES  relational  database  system. 

A.   SUMMARY  OF  THE  RESULTS  FOR  THE  ANALYTICAL  MODEL 

Review  of  the  results  for  the  analytical  model  indicate 
that  view  processing  strategies  are  most  sensitive  to  the 
frequency  of  updates  (P),  the  selectivity  of  the  view 
predicate  (fv),  the  selectivity  of  the  query  predicate  (fq) 
and  number  of  tuples  (1)  [Ref.  2].  For  select-project-join 
expressions,  and  except  for  high  values  of  P,  both  full  and 
semi-materialization  performed  better  than  query  modification. 
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Higher  values  of  P,  fv,l  or  lower  values  of  fq  favor  semi- 
materialization  over  full  materialization.  At  lower  values  of 
P,  fv,  and  1  full  materialization  is  slightly  better  than 
semi-materialization  as  the  update  costs  tend  to  be  low. 

For  general  expressions  semi-materialization  performed 
better  for  all  parameter  values  except  for  very  low  values  of 
P.  The  absence  of  an  efficient  differential  algorithm  for 
performing  incremental  updates  makes  the  use  of  general 
expression  an  unattractive  alternative. 


B.   EXPERIMENTAL  SETUP 

The  parameter  definitions,  parameter  default  values, 
access  paths  for  the  relations,  query  and  view  definitions  and 
the  profiles  of  the  database  relations  which  were  used  for 
the  experiment  are  shown  in  Figures  4  through  8,  respectively. 


N 

Cardinality  of  the  Relation. 

K 

Number  of  update  transactions  on  the  base  relations 

1 

Total  number  of  tuples  modified  by  each  update  transaction 

q 

Number  of  times  the  view  is  queried 

p 

Probabffitity  that  a  given  operation  is  an  update 

fv 

Selectivity  of  the  view  predicate  (fraction  of  tuples  in  view) 

fq 

Selectivity  of  query  predicate  (fraction  of  tuples  retrefved  by 

the  query  on  the  view) 

Figure  4:     View 
definitions. 


materialization 
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N 

sooo 

k 

-too 

1 

26 

q 

100 

R 

o.e 

tv 

O.I 

♦q 

0.1 

Figure  5:  Default  values  for  parameters. 


Relation  (s) 

Access  path 

EMP 

Clustered  index  on  Join  Hold    •_num 

POS 

Clustered  Index  on  level 

EMP  ' 

Cluttered  Index  on  Join  field     •-num 

POS    ' 

Clustered  index  on  level 

Figure  6:  Access  paths  for  relations. 

The  parameters  that  were  considered  for  the  sensitivity 
analysis  include  the  following  for  each  model  tested: 

1 .  The  fraction  of  updates  to  the  total  number  of 
operations  (P).  This  parameter  is  controlled  by  varying  the 
number  of  update  transactions  on  the  base  relations  (k)  and 
the  number  of  times  the  view  is  queried  (g). 
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2.  The  selectivity  of  the  view  predicate  (fv)  or  the 
fraction  of  tuples  retrieved  in  the  view  with  regard  to  the 
control  relation  POS.  This  fraction  is  controlled  by  varying 
the  value  v_threshold  (view_cut)  -  that  is,  the  predicates  in 
the  view  definition. 

3.  The  selectivity  of  the  query  predicate  (fq)  or  the 
fraction  of  tuples  retrieved  by  the  query  on  the  view.  The 
fraction  (fq)  is  controlled  by  varying  the  value  qthreshold 
(query_cut)  -  that  is,  the  predicate  in  the  query. 

4.  The  number  of  tuples  modified  by  each  transaction  (I). 
This  parameter  is  controlled  by  varying  the  number  of  tuples 
per  update  generated  by  the  data  generation  program. 

5.  The  number  of  records  in  the  base  relation(s) 
(cardinality  of  the  relation). 

Performance  data  was  collected  for  view  definitions  using 
select-project-join  and  general  expression  predicates  with  the 
database  stored  in  RAM  and  on  hard  disk. 

The  database  and  the  data  generation  program,  view 
materialization  simulation  program  and  various  Ingres  program 
files  were  placed  in  separate  sub-directories  on  the  hard  disk 
or  in  two  similar  RAM  drives  (4MB  for  database  and  1.8MB  for 
the  other  files)  to  determine  if  eliminating  the  hard  disk 
access  time  (28ms  average)  would  significantly  improve  the 
performance  of  the  view  processing  strategies. 
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EXPRESSION 

VIEW  1: 

CREATE  VIEW    FULL_VIEW 

SELECT  E_NUM,  ENAME.  SALARY,  KEYNO 
WHERE    o.E   NUM   -  p.E_NUM        AND 
p.  LEVEL  :>— VIEWCUT 

VIEW  2: 

CREATE  VIEW  FULL_VIEW 

SELECT  E_NUM,    ENAME,   SALARY,    KEYNO 
WHERE   EXISTS 
(SELECT    * 

WHERE   e.E_NUM   -  p.E_NUM 
p_o.KEYNO  -p_l. KEYNO 
AND  p_i.  LEVEL  >- VIEWCUT) 

QUERY 

SELECT    E_NUM,   ENAME,    KEYNO 

WHERE    SALARY  >-  QUERYCUT 

VIEVV1:                                                                 VIEW2: 
LBQCND 

SELECT-PROJECT  JOIN                                      GENERAL  EXPRESSION 

Figure  7:  View  definitions  and  query  on  the  views 


Two  types  of  operations  were  conducted  on  the  test 
database,  a  series  of  update  transactions  on  the  base 
relations  which  modified  a  varying  number  of  tuples  per  update 
and  queries  issued  against  views. 
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SIZE        12                  12 

10 
11 

25                       6000 
12                        C8S 

CARD  (EMP)  -   500 

E«            D#        ENAME 
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SALARY    TITLE              JOBDESC 

VAL,       600           50           500 
SIZE        12             12           C20 

500 
C70 

10                  SOO                     600 
14                    C30                    CSO 

Figure  8:  Profile  of  database  relations. 
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The  average  elapsed  time  per  query  for  all  updates  and 
queries  is  used  to  compute  the  performance  of  each  strategy. 

C.   PERFORMANCE  ANALYSIS 

This  section  discusses  the  performance  of  the  three  view 
processing  strategies  for  view  definitions  which  used  select- 
project-  join  and  general  expression  predicates.  These 
strategies  were  applied  to  the  test  database(s)  produced  using 
the  EMP,  POS  and  SKILL  text  files  generated  by  the  data 
generation  program.  The  reporting  method  will  consist  of 
reviewing  the  results  for  each  parameter  used  in  the 
sensitivity  analysis  of  the  two  models. 

1.   Model  1  :  Select-Project- Join 

MODEL  1  uses  the  following  view  definition  with  a 
select-project- join  predicate  for  the  three  view  processing 
strategies: 

lie .  ENUM,  e .  ENAME,  e .  SALARY{p .  LEVELzviewcut  (EMPxPOS)  ) 

a.  Results  for  Database  in  RAM 

In  this  section,  the  results  of  the  sensitivity 
analysis  for  Model  1  for  the  database  in  RAM  is  presented. 
Figures  9  through  12  show  the  results  for  model  1  for  the  four 
different  parameter  values  when  using  the  ram  disk. 

In  general,  the  trends  computed  for  the  analytical 
model  were  supported  by  the  empirical  results  for  the  runs 
with  the  database  stored  in  RAM. 
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The  sensitivity  analysis  for  the  probability  of 
update  parameter  shows  semi-materialization  performs  best  for 
values  of  P   greater  than  0.5  with  the  database  in  RAM. 
Full  materialization  was  the  clear  winner  for  values  of  P  less 
than  0.5. 

This  tradeoff  occurs  because  for  values  greater 
than  0.5  the  cost  of  processing  gueries  for  full 
materialization  averages  .35  seconds  while  the  cost  to  perform 
updates  averaged  .7  seconds.  The  cost  per  guery  for  semi- 
materialization  averaged  .8  seconds  but  the  cost  for  updates 
averaged  to  only  . 2  seconds . 

As  the  number  of  updates  increased  the  update  cost 
for  semi-materialization  was  guartered  while  the  cost  for  full 
materialization  doubled. 

The  average  cost  for  guery  modification  was  4.3 
second  per  transaction.  Query  modification  exhibited  the  same 
trend  as  the  analytical  model . 
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Figure  9:  Total  cost  per  query  in  seconds  vs.  the 
ratio  of  updates  to  the  total  number  of  operations 
P. 


For  the  selectivity  of  view  parameter,  the 
performance  of  the  full  and  semi-materialization  strategies 
were  virtually  identical  for  values  of  fv  less  than  0.3.  The 
performance  of  semi -materialization  improved  over  the 
performance  of  full  materialization  as  the  value  of  fv 
increased. 

The  average  cost  per  update  was  . 78  seconds  and 
cost  per  query  was  .63  seconds  for  full  materialization  for 
values  of  fv  less  than  0.3.  As  expected,  the  cost  for 
performing  updates  increased  significantly  as  the  value  of  fv 
increased. 
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Figure  10:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  view  predicate  fv. 


The  average  cost  of  updates  for  semi- 
materialization  was  .76  seconds  while  the  cost  for  queries 
averaged  3.26  seconds  over  the  entire  range  of  values  for  fv. 

The  cost  for  query  modification  increased  as  the 
size  of  the  view  increased  over  the  range  of  fv  as  expected. 
The  empirical  results  for  query  modification  were  virtually 
identical  to  the  analytical  model  trends. 

Full  materialization  provided  the  best  performance 
for  all  values  of  fg  for  the  strategies  with  the  database  in 
RAM. 
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The  average  cost  per  update  for  full 
materialization  was  .7  seconds  while  the  cost  per  query 
averaged  2 . 2  seconds . 
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Figure  11:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  query  on  the  view  fq. 


The  average  cost  per  update  for  semi- 
materialization  was  .25  seconds  but  the  cost  per  query 
averaged  3.7  seconds.  Semi-materialization  conformed  to  the 
trends  indicated  in  the  analytical  results  for  fq. 

The  average  cost  per  transaction  for  query 
modification  was  6.2  seconds  which  conformed  to  the 
performance  noted  for  the  analytical  model . 
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There  was  very  little  difference  between  the 
performances  of  semi-  or  full  materialization  over  the  entire 
range  of  values  of  1  -  number  of  tuples  per  update  for  the 
database  in  RAM. 

Full  materialization  performed  slightly  better  for 
values  of  I  less  than  40.  Semi-materialization  performed  best 
for  values  of  1  from  50  to  80  and  greater  than  90. 

The  average  cost  per  update  for  semi- 
materialization  was  .32  seconds  while  the  average  cost  per 
query  was  .79  seconds  for  all  values  of  2.  The  average  cost 
per  query  for  full  materialization  was  .36  seconds  and  average 
cost  per  update  was  .87  seconds  over  the  same  range  of  1 
values. 
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Figure  12:  Total  cost  per  query  in  seconds  vs.  the 
number  of  tuples  modified  by  each  update  I. 
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Full  materialization's  performance  for  the  I  parameter  far 
exceeded  the  expectations  indicated  by  the  analytical  model. 
Query  modification's  performance  on  the  analytical  model 
indicated  a  slight  improvement  for  higher  values  of  I  which 
was  not  supported  by  the  empirical  data. 

b.      Results  for  Database  on  Hard  Disk 

In  this  section  the  results  of  the  sensitivity 
analysis  for  Model  1  on  hard  disk  are  presented  for  comparison 
to  the  results  for  the  three  strategies  with  the  database  in 
RAM. 

For  the  probability  of  update  parameter,  the 
transition  to  semi-materialization's  performance  exceeding 
full  materialization  occurs  at  0.34.  This  indicates,  as 
expected,  that  disk  access  time  when  added  to  the  cost  of 
performing  updates  has  an  significant  impact  on  the 
performance  of  full  materialization. 

Semi-materialization  provides  the  best  performance 
for  values  of  P  greater  than  0.34.  The  figure  shows  a  less 
steep  increase  for  values  of  P  greater  than  0.7  than  the 
results  with  the  database  in  RAM.  Note,  however  that  the 
processing  cost  in  RAM  is  less  than  50%  of  the  similar  cost  on 
the  hard  disk. 
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Figure  13:  Total  cost  per  query  in  seconds  vs.  the 
ratio  of  updates  to  the  total  number  of  operations 
P. 


For  the  selectivity  on  the  view  parameter,  semi- 
materialization  performed  best  for  all  values  of  fv.  The 
difference  in  the  performance  appears  to  be  due  to  the 
additional  cost  added  for  accessing  the  disk  to  update  the 
base  relation  plus  the  additional  disk  accesses  required  to 
update  the  view. 

In  general  the  trends  are  identical  to  the  trends 
exhibited  by  the  analytical  and  RAM  models. 
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Figure  14:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  view  predicate  fv. 


Semi-materialization  was  the  best  performer  for 
values  of  fq  that  were  less  than  0.4.  Full  materialization 
performance  was  better  for  values  greater  than  0.4.  The 
performance  for  both  semi-  and  full  materialization  was 
virtually  identical  for  values  of  fq   between  0.2  and  0.4. 

The  trends  for  the  three  strategies  conform  to  the 
results  shown  for  the  analytical  model. 
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Figure  15:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  query  on  the  view  fq. 


Semi-materialization  was  the  clear  winner  for  all 
values  of  1.  Full  materialization  's  performance  improved  for 
values  of  1  greater  than  80  but  did  not  out  perform  semi- 
materialization.  Query  modification's  performance  conformed  to 
both  the  analytical  and  RAM  models.  The  results  for  both 
semi-  and  full  materialization  exceeded  the  results  shown  for 
the  analytical  model. 
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Figure  16:  Total  cost  per  query  in  seconds  vs.  the 
number  of  tuples  modified  by  each  update  1. 


c.      Discussion  of  the  results  for  Model  1 

In  general  the  empirical  data  supported  the 
conclusions  presented  in  the  analytical  review  of  the  view 
materialization  strategies  [Ref.  4]. 

Semi-materialization's  performance  was  superior 
for  higher  values  of  P,  lower  values  of  1,  fv  and  all  values 
of  £g  for  the  database  on  the  hard  disk.1  Semi- 
materialization  performed  best  with  the  database  in  RAM  for 
values  of  P  greater  than  0.5,  fv  greater  than  0.3  and  for  1 
values  between  50  to  80  and  greater  than  90. 


1  Semi-materialization  was  outperformed  by  full 
materialization  for  only  the  first  value  of  fv  while  using  the 
RAM  disk  drive. 


37 


This  was  due  to  the  low  cost  per  update  for  semi- 
materialization  when  compared  to  the  other  strategies.  The 
cost  advantages  of  performing  queries  and  updates  on  the 
redundant  subsets  is  due  primarily  to  the  fact  that  any 
transactions  performed  using  semi-materialization  are  on 
smaller  table(s)  than  the  base  relations. 

Full  materialization  performed  best  for  lower 
values  of  P,  1  and  for  all  values  of  fq  for  the  database  on 
RAM.  As  expected  full  materialization  performed  best  when  the 
primary  transaction  was  a  query.  Surprisingly,  full 
materialization  overall  performance  on  RAM  was  quite  good  even 
for  the  parameters  for  which  it  was  not  the  best  performer. 
For  example,  in  Figure  12,  full  materialization's  performance 
was  nearly  identical  to  semi-materialization  over  the  entire 
range  of  values  for  1.  Similarly,  full  materialization's 
performance  was  not  significantly  worst  than  semi- 
materialization  for  values  of  fv  as  shown  in  Figure  10.  Full 
materialization  performed  best  with  the  database  on  hard  disk 
for  P  less  than  0.34  and  for  fq   values  greater  than  0.4. 

Query  modification  outperformed  full 
materialization  for  values  of  P  greater  than  0.82  for  RAM  and 
0.84  on  the  hard  disk.  This  was  due  to  the  very  high  cost  per 
update  for  full  materialization  as  discussed  previously. 
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2.   Model  2  :  General  Expressions 

MODEL  2  uses  the  following  general  expression  view 
definition  expressed  in  relational  calculus: 

e .  ENUM,  e .  ENAME,  e .  SALARYwhereB 
( e .  ENUM=p .  ENUMAp .  LEVEL*  vi  ewcu  t  ( EMPx POS)  ) 

a.  Results  for  Database  in  RAM 

The  results  for  the  database  in  RAM  are  displayed 
in  Figures  17  through  20.  The  trends  exhibited  here  are 
noteworthy  since  the  performances  for  all  the  strategies 
exceed  the  results  obtained  from  the  earlier  experimental  data 
but  tended  to  conform  to  the  analytical  model  [Ref.  4]. 

The  results  for  the  probability  of  updates 
parameter  are  displayed  in  Figure  17  and  indicate  semi- 
materialization  outperformed  guery  modification  for  all  values 
of  P.  It  also  outperformed  full  materialization  for  values  of 
P   greater  than  0.1. 
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Figure  17:  Total  cost  per  query  in  seconds  vs.  the 
ratio  of  updates  to  the  total  number  of  operations 
P. 


Semi-materialization's  average  cost  per  update  was 
0.22  seconds  while  its  cost  per  query  averaged  5.73  seconds 
over  the  entire  range  of  P  values.  Full  materialization 
averaged  a  cost  per  query  of  0.31  seconds  but  its  advantage 
was  offset  with  an  initial  update  cost  of  62.89  seconds.  Query 
modification's  cost  per  transaction  was  33.65  seconds. 

Semi-materialization  was  the  best  performer  for 
all  values  of  fv,  which  coincided  with  the  trend  for  the 
analytical  model.  The  simulation  results  for  query 
modification  and  full  materialization  were  better  than  the 
results  for  analytical  model  for  all  values  of  fv. 
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Semi-materialization's  cost  per  update  averaged 
0.68  seconds  and  its  average  cost  per  query  was  21.3  seconds 
over  the  entire  range  of  fv  values.  Full  materialization 
averaged  1 . 5  seconds  per  query  and  its  cost  per  update 
averaged  270  seconds.  Query  modification's  cost  per 
transaction  averaged  to  51.7  seconds  for  fv   values. 
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Figure  18:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  view  predicate  fv. 


The  analysis  for  the  selectivity  of  the  query 
parameter  shows  that  semi-materialization  was  the  most  cost 
effective  strategy  for  processing  queries  for  general 
expressions. 
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The  analytical  and  empirical  results  for  query 
modification  were  nearly  identical  for  the  entire  range  of  fq 
values.  Full  materialization  proved  to  be  the  worst  performer 
of  the   strategies,  as  displayed  in  Figure  19. 

Semi-materialization  cost  per  update  for  fq 
averaged  to  0.23  seconds  while  its  average  cost  per  query  was 
20.7  seconds.  The  average  cost  per  update  for  full 
materialization  was  63.5  seconds  and  its  cost  per  query  was 
1.3  seconds.  Query  modification's  cost  per  transaction  was 
46.1  seconds. 
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Figure  19:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  query  on  the  view  fq. 
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Figure  20  shows  that  semi-materialization 
outperformed  full  materialization  and  query  modification  for 
all  values  of  1. 
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Figure  20:  Total  cost  per  query  in  seconds  vs.  the 
number  of  tuples  modified  by  each  update  1. 


Semi-materialization's  average  cost  per  update  was 
0.28  seconds  and  its  cost  per  query  was  5.7  seconds  for  all 
values  of  1.  Full  materialization's  cost  averaged  to  0.32 
seconds  per  query  and  63.2  seconds  per  update.  Query 
modification's  transaction  cost  averaged  33.7  seconds. 
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The  slight  increase  in  performance  cost  over  the 
entire  range  of  1  anticipated  by  the  analytical  model  for 
semi-materialization  as  the  number  of  tuples  per  update 
increased  was  not  supported  by  the  simulation  results. 
Query  modification  and  full  materialization  conformed  to  the 
results  plotted  for  the  analytical  model  [Ref.  4], 
b.      Results  for  Database  on  Hard  Disk 

In  this  section  the  results  of  the  sensitivity 
analysis  for  Model  2  on  hard  disk  are  presented.  Figures  21 
through  28  show  the  results  for  model  2  for  the  five  different 
parameter  values  when  using  the  hard  disk. 

Unlike  the  experiment  conducted  in  RAM  for  the 
three  view  processing  strategies,  the  cardinality  of  the  POS 
relation  will  be  varied  from  7500  to  10,000.  The  methodology 
used  to  conducting  the  sensitivity  analysis  for  the  four  other 
parameter  values  (P,  fv,  fq  and  I)  was  exactly  the  same  as  the 
methodology  used  for  the  experiment  conducted  in  RAM. 
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The  results  for  the  probability  of  updates 
parameter  is  displayed  in  Figures  21  and  22  shows  that  semi- 
materialization  out  performed  query  modification  over  the 
entire  range  of  P  values.  Its  performance  was  better  than  full 
materialization  for  values  of  P  greater  than  0.1. 
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Figure  21:  Total  cost  per  query  in  seconds  vs.  the 
ratio  of  updates  to  the  total  number  of  operations 
P  for  7500  records. 
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Figure  22:  Total  cost  per  query  in  seconds  vs.  the 
ratio  of  updates  to  the  total  number  of  operations 
P  for  10,000  records. 
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The  reason  for  semi-materialization  performance  is 
its  low  cost  per  update  which  offset  its  average  cost  per 
query.  For  an  cardinality  of  7500  records,  semi- 
materialization's  average  cost  per  update  was  1.55  seconds  and 
19.82  seconds  per  query.  Its  average  per  update  for  an  10,000 
record  cardinality  was  3.41  seconds  with  a  cost  per  query  of 
21.21  seconds. 

Full  materialization  performed  best  for  a  P  value 
of  0.1  or  less  for  both  cardinality  values  but  the  extremely 
high  cost  of  its  first  update  (250  seconds  for  7500:  417 
seconds  for  10,000)  quickly  overcame  its  cost  advantage  for 
processing  queries. 

Query  modification  outperformed  full 
materialization  for  P  greater  than  0.2  because  of  full 
materialization  high  cost  per  update. 

Semi-materialization  outperformed  both  query 
modification  and  full  materialization  over  the  entire  range  of 
fv  values.  The  cost  per  update  for  semi-materialization  with 
a  cardinality  of  7500  was  3.7  seconds  and  its  cost  per  query 
was  51.2  seconds  for  all  values  of  fv.  Semi-materialization's 
cost  per  update  for  a  cardinality  of  10,000  records  was  7.1 
seconds  while  its  cost  per  query  was  59.0  seconds.  Query 
modification  cost  per  transaction  averaged  102  seconds  for 
7500  records  and  180  seconds  for  10,000  records. 
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Full  materialization  cost  per  update  increased  at  a  rate  of 
150%  for  7500  records  and  doubled  for  10,000  records  as  the 
size  of  the  view  increased. 
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Figure  23:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  view  predicate  fv  on  7500 
records. 
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Figure  24:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  view  predicate  fv   for  10,000. 
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As  shown  in  Figures  25  and  26,  semi- 
materialization  outperformed  query  modification  and  full 
materialization  for  all  values  of  fq. 

The  average  cost  per  update  for  semi- 
materialization  with  a  cardinality  of  7500  was  1.5  seconds  and 
its  cost  per  query  was  64.5  seconds. 

Semi-materialization's  averaged  costs  for  a 
cardinality  of  10,000  was  2.8  seconds  for  updates  and  73.0 
seconds  for  queries. 
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Figure  25:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  query  in  seconds  on  the  view  fq 
for  7500  records. 
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Figure  26:  Total  cost  per  query  in  seconds  vs.  the 
selectivity  of  the  query  on  the  view  fq  for  10,000 
records. 


The  averaged  costs  for  full  materialization  for 
7500  records  was  2.5  seconds  per  query  and  256  seconds  per 
update. Query  modification's  costs  averaged  116.0  seconds  for 
7500  records  and  150.6  for  10,000  records. 
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As  expected  semi-materialization  was  the  clear 
winner  over  both  query  modification  and  full  materialization 
over  the  entire  range  of  1  values. 
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Figure  27:  Total  cost  per  query  in  seconds  vs.  the 
number  of  tuples  modified  by  each  update  I  for  7500 
records . 
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Figure  28:  Total  cost  per  query  in  seconds  vs.  the 
number  of  tuples  modified  by  each  update  1  for  10, 
000  records . 
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Semi-materialization's  performance  cost  averaged 
2.1  seconds  per  update  and  18.8  seconds  per  query  with  a 
cardinality  of  7500  for  all  values  of  I. 

Similarly  its  averaged  costs  for  10,000  records 
were  3.7  seconds  per  update  and  20.2  seconds  per  query.  Full 
materialization's  costs  averaged  0.71  per  query  and  253 
seconds  per  update  with  a  cardinality  of  7500.  Its  average 
costs  were  0.72  seconds  per  query  and  421  seconds  per  update 
for  10,000  records.  Query  modification  had  an  average  cost  of 
68.7  seconds  per  transaction  for  7500  records  and  103.2 
seconds  for  10,000  records. 

The  additional  cost  associated  with  using  a  hard 
disk  drive  and  increasing  the  cardinality  had  an  impact  on  the 
performance  experienced  for  both  query  modification  and  full 
materialization. 

The  cost  for  processing  queries  using  query 
modification  on  the  hard  disk  increased  three  fold  over  the 
cost  of  processing  the  query  in  RAM. 

The  cost  of  processing  updates  using  full 
materialization  increased  by  700%  over  the  same  cost  for 
updates  in  RAM  but  the  cost  for  query  processing  only  doubled. 


51 


The  costs  for  semi-materialization  also  increased 
by  a  factor  of  ten  for  updates  and  quadrupled  for  query 
processing.  The  increase  in  processing  costs  for  semi- 
materialization  was  offset  by  the  use  of  the  redundant  subsets 
of  the  base  relation  which  allowed  for  a  more  efficient 
construction  of  the  materialized  view. 

c.      Discussion  of  Results  for  Model   2 

Semi-materialization  performed  better  with  the 
database  in  RAM  and  on  hard  disk  for  the  entire  range  of 
values  of  cardinality  than  both  query  modification  and  full 
materialization  for  all  parameters  except  for  a  value  of  P 
less  than  0.1.  Full  materialization  performed  better  for  P 
less  than  0.1  because  its  average  cost  per  query  was  only  .42 
seconds  while  semi-materialization's  cost  averaged  15.4 
seconds . 

For  values  of  P  greater  than  0.1  the  cost  for 
performing  a  single  update  for  full  materialization  rose  to 
62.89  seconds  for  5000  records  in  RAM,  250  seconds  for  7500 
records  on  hard  disk, and  417  seconds  for  10,000  records  on 
hard  disk  which  offset  any  advantage  offered  by  full 
materialization  superior  performance  for  query  processing. 
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This  dramatic  cost  increase  for  update  processing  for  full 
materialization  using  general  expression  predicates  is  due  the 
lack  of  an  efficient  differential  update  algorithm.  This 
necessities  the  complete  re-evaluation  of  the  view  definition 
for  any  update  transaction. 

Query  modification  outperformed  full 
materialization  for  all  parameters  except  for  values  of  P  less 
than  0.35  for  all  values  of  cardinality.  As  indicated  above 
the  cost  of  a  single  update  transaction  for  full 
materialization  quickly  drives  its  cost  higher  than  an  other 
strategies. 

In  this  case  as  the  number  of  updates  increased 
the  aggregate  cost  for  query  modification  dropped  since  the 
cost  of  updating  base  relations  to  support  this  strategy  are 
not  timed.  The  cost  of  processing  four  or  less  updates 
(average  cost  per  update  for  query  modification  is  0)  combined 
aggregate  cost  for  processing  six  queries  (average  cost  per 
query  is  33.6  seconds)  is  more  than  the  cost  for  processing 
the  same  number  of  transactions  for  full  materialization. 

Full  materialization's  superior  performance  for 
lower  values  of  P  is  evident  and  is  based  its  low  cost  for 
processing  queries  on  the  view.  This  advantage  was  quickly 
overwhelmed  by  the  overhead  of  maintaining  the  fully 
materialized  view  [Ref.  4], 
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Semi-materialization  performed  best  on  hard  disk 
for  all  parameters  expect  for  a  value  of  P  of  0.1  or  less. 
Full  materialization  was  the  best  performer  for  that  value  of 
P  because  the  only  transaction  performed  for  those  values  was 
query  processing. 

As  indicated  above,  semi-materialization  is  the 
best  strategy  for  processing  view  definitions  using  general 
expressions  for  predicates. 

It  is  interesting  to  note  that  the  results  shown 
in  Figures  17  through  28  show  that,  in  general,  the 
performance  trends  for  the  view  processing  strategies  are  the 
same  for  P,  fv,  fq,  and  1  for  the  entire  range  of  values  for 
cardinality  of  POS. 
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V.   CONCLUSIONS  AND  RECOMMENDATIONS 

The  purpose  of  this  chapter  is  to  state  conclusions  based 
on  the  research  and  make  recommendations  for  improvements  and 
further  study  on  the  three  view  materialization  strategies. 

A.   CONCLUSIONS 

The  empirical  data  of  this  thesis  confirms  that  the  semi- 
materialization  strategy  is  best  method  for  processing  views 
with  predicates  using  general  expressions. 

The  performance  of  semi-materialization  with  the  database 
in  RAM  exceeded  the  trends  forecasted  for  general  expressions 
for  the  analytical  model  or  actual  results  achieved  on  the 
earlier  experimental  study  [Ref.  ?]  .  This  is  reasonable 
because  of  the  cost  penalty  paid  by  the  strategy  when  it 
becomes  necessary  to  access  a  hard  disk  to  perform  updates  and 
queries. 

The  trends  for  the  simulations  for  semi-materialization 

with  its  database  stored  in  RAM  indicate  it  may  be  suitable 

for  near  real  time  view  processing  using  general  expressions 

based  on  its  relatively  low  average  costs  for  updates  and 

queries. 
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Full  materialization  performed  well  for  lower  values  of  P 
due  to  its  low  average  cost  per  query  while  query 
modification's  performance  was  good  over  all  parameter  values 
but  both  strategies  are  less  efficient  than  semi- 
materialization  for  general  expressions. 

Select-project-join  view  definitions  with  the  database  in 
RAM  proved  to  be  the  most  cost  effective  method  for  view 
processing  (see  Figures  9-12). 

Overall  performance  for  all  parameter  simulations  using 
this  view  definition  and  the  ram  disk  drive  were  three  to  five 
times  faster  than  similar  runs  using  a  hard  disk  drive.  These 
savings  are  significant  when  considering  view  processing  for 
the  small  databases  inherent  to  tactical  environments. 

B.   RECOMMENDATIONS  AND  FUTURE  RESEARCH 

We  recommend  that  the  same  simulations  for  select-project- 
join  and  general  expressions  be  conducted  with  an  80486 
processor  with  a  minimum  of  16  MB  of  RAM  to  test  databases 
with  up  to  20K  records.  The  internal  8k  cache  and  math  co- 
processor should  significantly  reduce  the  processing  times  for 
all  three  strategies  using  either  view  definition. 

We  predict  that  this  approach  could  improve  the 
performance  of  the  semi-materialization  strategy  well  enough 
to  make  it  feasible  for  use  with  real  time  tactical  systems. 
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For  example,  the  electronic  order  of  battle  maintained  on 
board  a  tactical  aircraft  could  be  completely  updated  during 
or  enroute  to  an  engagement  using  information  received  by  its 
own  sensors  or  sensor  information  passed  from  other  sources. 

This  strategy  could  be  used  in  conjunction  with  the  Joint 
Ocean  Tactical  Surveillance  (JOTS)  system  to  provide  a  real 
time  computer  generated  picture  of  the  tactical  and  strategic 
environments . 

Used  in  this  manner,  JOTS  could  be  placed  on  board  classes 
of  ships  which  do  not  have  the  Naval  Tactical  Data  System 
(NTDS)  installed  on  board  at  a  tremendous  savings  over  back 
fitting  the  vessels  with  NTDS.  The  information  would  improve 
the  vessel ' s  mission  performance  by  keeping  the  Commanding 
Officer  constantly  updated  with  real  time  battle  group 
position  data  and  allow  for  the  information  received  by  that 
vessel's  sensors  to  be  incorporated  into  the  tactical  picture. 

We  also  recommend  conducting  more  simulations  on  actual 
databases  with  more  than  two  relations,  and  updates  applied  to 
several  relations. 

Finally,  further  work  is  needed  to  investigate  the 
performance  of  view  processing  strategies  in  the  presence  of 
overlapping  views  over  the  same  relation. 
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APPENDIX  A.   DATA  GENERATION  PROGRAM 
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/♦Author:  Curtis  Barefield         */ 

/♦Title:  Data  Generation  Program  */ 

/•version:  MS  C  6.0  /<:♦♦    (T2)     */ 

/•created:  17  June  91  */ 

/•updated:  11  Aug  91  •/ 

/*.*.***.*******•..************.*****.***•*.*****.****.***....*..***.*.*.******•*•**.**»*** 

*  This  program  was  written  to  replace  the  previous  hardwired  test  database  * 

*  generating  program  with  a  more  generic  program.  This  program  generates  the  text  records* 

*  used  to  create  the  database  used  to  test  the  view  materialization  * 

*  strategies  purposed  by  Professor   Magdi  Kamel.  The  associated  test  * 

*  pg  has  been  written  by  Lt.  Jesse   South.  USN.  a  CSM  student  in  class  PL03.  * 
.*.**•*••******•*..**.****...**.«*••*.***************•***..*..*********•*..*.***.**..***.*/ 

#include  <stddef.h> 

#include  <stdio.h> 

#include  <stdlib.h> 

# include  <time.h> 

#include  <ctype.h> 

#include  <string.h> 

#define  size    16   /*  sets  buffer  size  for  output  file  name  */ 

#define  ALPHA   1  /*  set  buffer  for  type/info  values  */ 

#define  BOUND   6  /*  set  buffer  for  upper/lower  bounds*/ 

/•DEBUG  TOGGLE  */ 

#define  TROUBLESHOOTING  0 

/*  1  sends  debug  data  to   output.txt  file  */ 

/*  prints  data  used  for  debugging  pg  */ 

void   print_random_generator_array (int.  int*) .  print_bounded_array ( int ,  long*): 

/*  used  to  read  data  into  structs  */ 

int  rand().  count,  incrementl,  increment2.  incretnent3.  t: 

long  in.  resultsl.  results2.  results3: 

/*  modules  used  to  generate  random  values  */ 

unsigned  timer.  time_hack( ) : 

void  srand ( unsigned  int): 

FILE  *input  file.  *output_f ile;  /*  file  pointers  for  text  files  */ 
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/***##*******#*************#****###5XRUCTS**********************************************************/ 

struct        f ield_attributes 
( 

char   f ield_type [ALPHA] : 

int    field_width; 

char   f ield_inf ormation [ALPHA]  ; 

char   lower_bound [ BOUND ] : 

long   increment: 

char   upper_bound[ BOUND] ; 

struct        f ield_attributes*next; 
}ATTRIBUTE: 
struct  spec_type 
{ 

long    number_of_records: 

int     number  of  fields: 

char    f ile_name[size] : 

struct  f ield_attributes     *f irst_f ield: 

struct  spec_type     *next: 
}SPECIFICATION; 

/*  declares  modules  used  to  generate  attributes  */ 
void  type_alpha(  struct  f ield_attributes  *) : 
void  type_numeric(  struct  f ield_attributes  *)  ; 
void  bounded_sequential_array (  long.  long,  int): 
void  type_alphanumeric(  struct  f ield_attributes  *); 
void  counter (  int,  int); 
void  random_generator ( int ,  int,  int): 

void  print_database_specif ications( struct  spec_type  *) ; 
void  random_long_array ( long,  long,  long): 
struct  spec  type  *list  =  NULL: 
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void   main( ) 
{ 

char  f ile_name [size] .  field_type [ALPHA] .  f ield_inf ormation [ALPHA] : 
char  c_lower_bound [BOUND] .  c_uppe r_bound[ BOUND ] : 
long  number  of  records: 

int   field_width.  number_of_f ields.  i.  increment: 
/*  creates  and  defines  ptrs  used  for  linked  lists  */ 
struct  f ield_attributes     *new_attribute=  NULL: 
struct  f ield_attributes     *next_field  =  NULL: 
struct  f ield_attributes     *end  =  NULL: 
struct         spec_type      *new_spec=  NULL; 
struct         spec_type      *top_spec   ■  NULL: 
struct  spec_type     *next_spec=  NULL: 
list  =  NULL: 
input_f ile=  fopen( "data_in" . "r" ) : 
if  (  (input_f  lie)  "NULL)  ( 

printf("CAN  NOT  OPEN  INPUT  FILE\n" ) : 
}  /*opn  input  file*/ 

/*  loads  structs  and  creates  linked  lists  */ 
whileO (feof (input_file) ) ) 
< 

/*  allocates  memory  for  each  specification  struct  */ 
new_spec  ■  (struct  spec_type  *)malloc( sizeof ( struct  spec_type)): 
if(top_spec  =  =  NULL)   /*  sets  ptr  to  top  of  linked  list  */ 
< 
top_spec  =  new_spec; 
next_spec  ■  new_spec: 
list      *  new_spec: 
} 

else 
{ 
next_spec->next  »  new_spec:/*  sets  ptr  to  next  spec  */ 
next_spec  ■  next_spec->next ; 
) 
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/*  read  in  data  and  loads  structs  */ 

if(l  !=  (fscanf ( input_file. "%ld" . &number_of_records) ))  ( 
printff"  UNEXPECTED  VALUE.  PROGRAM  TERMINATED."): 
exit (0) : 
> 
next  spec->number  of  records=number  of  records: 
if(l  !=  (fscanf ( input_f ile. "%ld" . &number_of_f ields ) ) )  ( 
printff  UNEXPECTED  VALUE.  PROGRAM  TERMINATED."); 
exit(0) : 
) 
next_spec- >number_of_f ields=number_of_f ields  : 
if(l  !=  (fscanf ( input_f ile. "%s" . f ile_name) )) ( 

printff  UNEXPECTED  VALUE.  PROGRAM  TERMINATED."); 
exit(0) ; 
I 
strcpy (next_spec->  f ile_name. f ile_name) ; 
next_spec->first_f ield  ■  NULL: 
next_spec- >next  =  NULL: 

/*  create  linked  list  for  attributes  */ 
for(i=0:  i<next_spec->number_of_f ields:  +*i) 
{ 

/*  allocates  memory  for  attribute  struct  */ 
new_attribute  =(struct  f ield_attributes  *)  malloc  (sizeof ( struct  f ield_attributes ) 
if (next_spec->first_field  "  NULL) 
< 

next_spec->f irst_f ield  *  new_attribute:     /•  sets  ptr  to  top  of  list  */ 
end  =  next_spec->f irst_f ield: 
} 

else 
{ 

end->next  «  new_attribute;  /*  sets  ptr  to  next  attribute  */ 

end  ■  end- > next: 
} 

/*  load  attribute  struct  */ 
if(l  !-  (fscanf (input_file."*s".field_type) ) ){ 

printff "  UNEXPECTED  VALUE.  PROGRAM  TERMINATED."); 

exit(0); 

> 
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strcpy (end-> f ield_type. f ield_type) ; 

if(l  !■  (fscanf (input_file. "%d".&field_width) ) ) ( 

printf("  UNEXPECTED  VALUE.  PROGRAM  TERMINATED."); 

exit(O) : 

) 
end- >  f ield_width  =  f ield_width; 
if(l  !=  (fscanf ( input_f ile. "%s" . f ield_inf ormation) ) ) ( 

printf("  UNEXPECTED  VALUE.  PROGRAM  TERMINATED."); 

exit(O) ; 

I 
strcpy (end- > fie ld_informat ion. f ield_inf ormation) ; 
if(l  !=  (fscanf (input_file."%s".c_lower_bound) )) ( 

printfC  UNEXPECTED  VALUE.  PROGRAM  TERMINATED."); 

exit(O) : 

} 
strcpy (end- > lower_bound. c_lower_bound) ; 
if(l  !=  (fscanf (input_file, "%d" . Sincrement ) ))  { 

printfC  UNEXPECTED  VALUE.  PROGRAM  TERMINATED."): 

exit(O) : 

> 
end- > increment  =  increment; 
if(l  !=  (fscanf ( input_f He. "%s" ,c_upper_bound) )) ( 

printfC"  UNEXPECTED  VALUE.  PROGRAM  TERMINATED."); 

exit(O) ; 

} 
strcpy (end- >upper_bound. c_upper_bound) ; 
end  ->next  =NULL; 
> 
} 
f close ( input_f ile ) : 

/*  Redirects  monitor  output  to  text  file  called  output.txt  */ 
#if   TROUBLESHOOTING 

f reopen ("output. txt" . "w" , stdout) : 

print_database_specif ications( list )  ; 

t=0: 

#define  DEEP_DEBUG    1 
#endif 
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/*  uses  system  time  to  generate  random  number  */ 
timer= time_hack( ) ; 
srand( timer) ; 

/*  uses  linked  list  to  read  each  spec  (BEGIN  PG  EXEC  )  */ 
whiledist  !=  NULL) 
( 

resultsl=0:  /*  sets  module  "counter"  to  zero  */ 

results2=0; 

results3=0; 

incrementl=0 : 

increment2=0: 

increment3=0: 

output_f ile=  f open( list->  f ile_name.  "w"  )  : 

if ( (output_file)==NULL)  ( 

printfC'CAN  NOT  OPEN  OUTPUT  FILE\n" ) : 
exit(O) ; 
)  /*opn  output  file*/ 

/*  BUILD  DATABASE  TEXT  FILES  */ 
for(in=0:in< list->number_of_records; ♦  ♦in) 
{ 

next_field  =  list->f irst_f ield; 

count=0; 

while(next_field  !»NULL) 

{ 

if (next_field  ->  f ield_type[0] == ' A' ) 
{ 

type_alpha(  next_f ield) : 
> 

else  if (next_field  ->  f ield_type[0] =='N' ) 
< 

type_numeric(  next_field) : 
} 

else  if  (next_field  ->  f ield_type[0]  ==■  'Op  ) 
{ 

type_alphanumeric(next_f ield) : 
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next_field  =  next_f ield- >next : 
if (next_field  ! =NULL) { 

fprintf (output_f ile. ":") ; 
> 
} 

fprintf (output_f ile. "\n" ) : 
) 

list  =  list->next: 
f close (output_f ile) ; 
) 
exit(O) : 

} 

/•••***.*.**.****•****..•*.*...»**.*.....*... •**••*••••*••••• 

*  Time  Hack  uses  the  system  clock  to  provide  the  seed  value    * 

*  for  the  rand  and  srand  library  functions.     * 

unsigned  time_hack() 
{ 

clock_t    rand_input; 
unsigned  seed_input: 
rand_input=clock( ) : 
seed_input- (unsigned) rand_input: 
return ( seed_input) ; 
} 
/*** •**•***•********••**••**#•*•••••*•**•***•*•**••*****••**•**• 

*  Bounded  sequential  array  * 

*  creates  a  bounded  array  which  is  used  with  the  rand()  library   * 

*  function  to  select  from  a  user  specified  number  of  array       * 

*  elements  to  return  a  value  from  within  that  array.  * 
•••••••••A**************************************************************/ 

void  bounded_«equential_array (long_increment.  long_lower_bound,  int  number_of_values  ) 

( 

int  m,  long_index: 

long   *long_storage . long_low; 

/*  allocate  storage  space  in  memory  for  array  */ 

long  storage= ( long*) calloc( number  of  values. sizeoft long) ) ; 
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/*  fill  array  */ 

long  storage [0] =  long  low=long_lower_bound: 

for(m=l:  m<  number_of _values:  m*+) 

( 

long  storage  [n\]  =long_low  +=long_increment : 
) 

/*  select  value  fm  array  using  generic  random  generator  */ 
long  index= ( rand ( ) %number_of _values ) : 

fprintf (output_f ile . "%051d" . long_storage [ long_index] )  : 
#if  DEEP_DEBUG 

if (t<l)( 

print_bounded_array (number_of_values .  long_storage) : } 

t++ ; 
#endif 


*  Type  alpha  generates  alpha  character  string  to  fill  user        * 

*  determined  field  size  * 

void  type_alpha(  struct  f ield_attributes  *next_field) 
{ 

char  c; 

static  char  start='A'-l: 

int  J.  increment.  lower_bound.  upper_bound: 

int  number_of_values: 

long_increroent .  long_lower_bound: 

/*******  Conversions  for  Random  Generator  ***•****»***/ 

lower_bound= (int) (next_f ield  ->lower_bound[0] )  : 

upper_bound» ( int ) (next_f ield  ->upper  bound[0]): 

increment*next_f ield  - >  increment ; 

/***»*«**»**#»  conversions  for  Bounded  Array  ********/ 

long_lower_bound= (atol ) (next_field  -> lower_bound) ; 

long_increment*next_field  - > increment; 

number_of_values= (atoi ) (next_f ield  ->upper  bound); 

if (next_field  -> f ield_information[0]  "  ' R' )  /•  Random  field  selected  •/ 

{ 

random_generator( increment ,  lower_bound.  upper_bound) ; 

> 
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else  if (next_f ield  - > f ield_information[0] == ' B ' )  /*  Bounded  sequential   field 

selected  */ 
( 

bounded_sequential  array ( long_increment .  long  lower  bound,  number  of  values) 
) 
else  if ( next_f ield  - >f ield_inf ormation[0] == ' X' ! | ' D ' ) 

/*  countup  counter  places  chars  'A'-'Z'  in  field  */ 
{ 

for( j=0:J<next_field  ->f ield_width: ♦+ j )   /*  sets  field  width  */ 
( 

if (start  >=Z' ) 
( 

start=  '  A'  : 
) 

else 
( 

♦♦start : 
> 

fprintf (output_f ile. "%c" . start) :  /*  prints  alpha  char  to 
output  file  */ 
I 
} 

} 
•**♦♦**«»♦»•«**••#•♦»*•«*•*»•♦»•»♦»*♦•***♦*»****•*«»*«*»*•*•*«»•* 

*  Type  numeic  generates  numeric  chararters  to  fill  user    * 

*  determined  field  size  * 
ft*************************************.************.**************/ 

void  type_numeric(  struct  f ield_attributes  *next_field) 

( 

char  c; 

static  8tart»'0"-l: 

int  j.  lower_bound.  increment.  upper_bound: 

int  number_of_values,  increment_c.  lower_bound_c: 

long_increment .  long_lower_bound.  increment_long_array : 

long  lower_bound_long_array.  upper_bound_long_array: 
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/*•*******«***  Conversions  for  Counter  ***************/ 
increment_c  ■  next_f ield- > increment : 
lower_bound_c  = (atoi ) (next_f ield- > lower_bound) ; 
/*******  Conversions  for  Random  Generator  ************/ 

lower_bound= ( int ) (next_f ield  - > lower_bound[0] ) : 

upper_bound= ( int ) (next_f ield  ->upper_bound[0] ) : 

increment=next_f ield  - >  increment : 

/*************  conversions  for  Bounded  Array  ********/ 

long_lower_bound= (atol ) (next_f ield  ->lower_bound) : 

long_increment=next_f ield  -> increment: 

number_of_values= (atoi ) (next_f ield  ->upper_bound) : 

/**•»**«***»**  Conversions  for  Random  Long  Array  ********/ 

lower_bound_long_array  =( atol ) (next_f ield  - >lower_bound) : 

increment  long  array    =(long)(next  field  ->increment ) : 

upper_bound_long_array  = (atol ) (next_f ield  - >upper_bound) ; 
if (next_field  -> f ield_information[0] « ' S ' )    /*  sequential  counter  */ 
( 

counter ( increment_c.  lower_bound  c) : 
count**: 
> 

else  if (next_field  -> f ield_information[0] == ' R' )  /*  Random  */ 
( 

random_generator (  increment.  lower_bound,  upper  bound): 
} 

else  if (next_f ield  ->f ield_information[0] ==  '  B  '  )  /*  Bounded  sequential  */ 
{ 

bounded_sequential_array (long_increment.  long_lower_bound.  number_of_values  ); 
} 

else  if (next_f ield  ->f ield_inf ormationfO] « 'X' )  /*  Random  long  array  */ 
{ 

random_long_array ( increment_long_array ,  lower_bound_long_array .  upper_bound_long_array 
> 
else  if (next  field  ->field  infonnation[0] *= ' 0 ' ) 
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/*  countup  counter,  fills  field  with  '0'  -  '9'  in  sequence  */ 
{ 

fort j=0: J <next_field  - > f ield_width: ** j )   /*  sets  field  width  */ 
( 

if (start  >  =  '9  '  ) 
( 

start=  '0' ; 
> 

else 
{ 

♦♦start : 
> 

fprintf (output_f lie. "*c" . start) ;  /*  prints  numeric  char  to 
output  file  */ 
> 
} 
} 

*  Type  alphanumeric  generates  alphanumeric  chararters  to  fill    * 

*  user  determined  field  size  * 

void  type_alphanumeric( struct  f ield_attributes  *next_field) 
{ 

char  c.  d; 

int   j.  k   /».  count=0*/  : 

static  char  alpha* 'A' -1: 

static  char  numeric  '  0  ' -1 ; 

int  alpha_field_width.  numeric_f ield_width: 

alpha_field_width«(atoi) (next_field  ->lower_bound) ; 

numeric  field  width* (atoi) (next  field  ->upper  bound): 
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for ( j=0; j <alpha_f ield_width: ++j )    /*  generates   alpha  chars  'A'-'Z'  */ 

{ 

if (alpha  >='Z' ) 

{ 

alpha= '  A'  : 
) 

else 
{ 

♦♦alpha : 

> 

f print f(output_f lie. "%c" .alpha) ; 
> 
f or ( k=0: k<numeric_f ield_width; +*k)    /*  numerics  '0'-'9'  */ 

{ 

if (numeric  >= ' 9 ' ) 

( 

numeric* ' 0' : 

} 

else 

{ 

♦♦numeric: 

} 

fprintf (output_f ile. "%c" .numeric) ; 
} 
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*  Random  generator  takes  a  lower,  upper  and  * 

*  increment  int  value,  creates  an   numeric  array   and        * 

*  uses  the  functions  rand()  and  srand()  to  * 

*  select  an  array  element  which  maybe  converted  to  a         * 

*  alphanumeric  char  for  printing  to  the  output  file.         * 
..A************************.******..***.**************.******. ********.*/ 

void  random  generator)  int  increment,  int  lower_bound.  int  upper  bound) 
< 

/*  Declare  module  data  elements  */ 
int   numeric_index: 
int   output_f rom_array; 

int   generate_numeric_array ( int. int. int); 
char  alnum_character  output; 
/*  Compute  array  size  */ 

numeric_index=  ( (upper_bound-lower_bound) /increment ) ; 
/*  Determine  if  a  set  of  bounded  random  numbers  are  required  */ 

if ( ( ( lower_bound>47 ) &&(upper_bound<58) )  | |   ( lower_bound>64 )fiS(upper_bound<91 ) ) 
( 

output_f rom_array=  generate_nuraeric_array (numeric_index.  increment.  lower_bound) 
alnum_character_output= (char) (output_f rom_array) ; 
fprintf  (output_f  ile.  "9s04c"  .  alnum_character_output )  ; 
} 

else 
< 

output_f rom_array*  generate_numeric_array (numeric_index.  increment.  lower_bound) : 
fprintf (output_f ile.  "*04d".  output_from_array) ; 
) 
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*  Random  long  array  takes  a  lower,  upper  and  * 

*  Increment  long  value,  creates  an  numeric  array   and     * 

*  uses  the  functions  rand()  and  srand()  to       * 

*  select  an  array  element  which  is   printed  to  * 

*  the  output  file.  * 
•A***************.***.********...*.*. •***********•••*•****»*•********..*/ 

void    random  long  array!     long    increment_long_array .     long    lower_bound_long  array.     long 

upper_bound_long_array ) 

{ 

/*  Declare  module  data  elements  */ 
int  m.  index: 
long  numeric_index.  low: 
long  *output_f rom_array : 
long  output_f rora  random: 
/*  Compute  array  size  */ 

numeric_index=  ( (upper_bound_long_array-lower_bound_long_array ) /increment_long_array ) : 
if (numeric_index<  20) 
{ 
/*  Allocate  memory  block  for  long  array*/ 

output_f rom_array=  ( long*)calloc(numeric_index. sizeof ( long) ) : 
/*  Set  lower  bound  of  array  */ 

output_f rom_array [0] =  low=  lower_bound_long_array : 
/*  Load  array  */ 
for  (m=l  ;m<numeric_index:in+*) 
{ 

output_f rom_array [m] =  low+=  increment_long_array : 
) 

index*  ( rand( ) %numeric_index) : 

fprintf (output_file. "*041d" . output_f rom_ar ray [index] )  : 
> 

else 
{ 
output_f rom_random=  (l*(rand(  )Ssupper_bound_long_array)  )  : 
fprintf (output_f ile. "%041d" . output_f rom_random) : 
) 
) 


72 


/********* *********«.*....♦**.***..*«.**.* ft**.**.************* 

*  Counter  uses  lower  bound  and  increment  to  act  as  * 

*  a  sequential  counter  for  max  of  numbers  per  spec_type.     * 

void  counter!  int  increment_c.  int  lower_bound_c) 
< 

/**********«**  FIRST  COUNTER  ***********/ 

if ( (in==0)&&(resultsl==0)&S(count==0) ) 
{ 

result si  =  lower_bound_c: 

incrementl  =  increment_c: 
I 

else  if(count==0) 
( 

resultsl  ■  resultsl*incrementl : 
} 
if  (count"0)  { 

fprintf (output_file, "%041d" . resultsl) : } 
/**»**♦*****  SECOND  COUNTER  *•****«*****/ 
if  (  (in==0)&&(results2"0)&&(count  =  =  l)  ) 
{ 

results2  =  lower_bound_c; 

increment2  ■  increment_c: 
) 

else  if(count==l) 
< 

results2  ■  results2+increment2; 
) 
if (count==l) { 

fprintf (output_file."%041d".results2)  :  } 
/•**••*•*•**•  THIRD  COUNTER  ************/ 
if ( (in=»0)&&(results3«0)S&(count"2) ) 
< 

results3  »  lower_bound_c: 

increment3  *  increment_c: 
} 
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else  if(count==2) 

< 

results3  =  resul ts3» increments : 
> 
if (count==2) ( 

fprintf (output_f ile. "%041d" . resul ts3) : ) 
/oil**************************************/ 


*  Generate  numeric  array  produces  a  bounded 

*  array  and  uses  the  rand()  function  to  simulate  a  real 

*  random  number  generator 

int  generate_numeric_array ( int  numeric_index.  int  increment, 
int  lower  bound) 


{ 


int  m.  index: 

int   *numeric_storage,    low; 

nuttier  ic_storage  =  (  int*)calloc(numeric_index  ,  sizeof  (  int)  )  ; 

numeric_storage [0] =  low  ■  lower_bound; 

for(m=l:  m<numeric_index;  m*+) 

{ 

numeric_storage[m] »low  +«  increment: 
> 

index* ( rand( ) %nutneric_index) : 
return (numeric_storage[ index] ) : 
#if  DEEP_DEBUG 

if(t<l)< 

print_random_generator  array (numeric_index.  numeric_storage) ; } 

t  +  +: 
#endif 
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/*•**••••••••*  print  RANDOM  GENERATOR  ARRAY  ******•*****«***********•**»**/ 

void  print_random_generator  array(int  numeric_index. int  *numeric_storage ) 
( 

int  i: 

for ( i=0: i (numeric  index : i** ) ( 

printft"  array[%d]  ■  *d\n" . i , numeric_storage [ i ] )  ; 

} 
} 

/»•********«*♦***  print  BOUNDED  ARRAY  ************************************/ 
void  print  bounded_array ( int  number_of_values. long  *long  storage) 
{ 

int  i: 

for ( i  =  0: i <number_of_values;  i*+ )  { 

printfl"  array[Sd]  =  %051d\n" . i . long_storage[i ] )  ; 

} 
} 

*      Prints  specifications  including  all  attribute  link  lists     * 
•A*******************************************************************/ 

void  print_database_specifications( struct  spec  type  *list) 

( 

struct  field_attributes     *next_field: 

int  i.  1=0: 

printf ( "Printing  link  lists  for  generic  database  generator\n" ) : 

whiledist  !  =NULL) 

< 

i=0; 

printf ("\tSPEC  %d\n".+*l): 

printf ( "\tThere  will  be  %ld  records\n" . list->number_of_records) 

printf ("\Tthere  will  be  %d  f ields\n" . list->number_of_f ields) ; 

printf ("\tThe  file  name  is  %s\n" . list-> f ile_name) : 

next_field  »  list->f irst_field: 

while (next_field  !-NULL) 

< 

printf ("\tField  *d: \n" . ♦  ♦i ) : 

printf ("\t%s  is  field  type\n" ,next_f ield->f ield_type) ; 

printf ("\t%d  is  field  width\n" . next_f ield->f ield_width) ; 
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printf("\t%s  is  field  inf o\n" . next_f ield- > field_inf ormation) 
printf ( "\t%s  is  lower  bound\n" . next_f ield-> lower_bound) : 
printf ( " \t%ld  is  increment\n" . next_f ield- > increment ) ; 
printf("\t%s  is  upper  bound\n" . next_f ield->upper_bound) ; 
next_field  =  next_f ield- >next : 

} 

list  =  list->next: 
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APPENDIX  B.   VIEW  MATERIALIZATION  SIMULATION  PROGRAM 
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/*  Title  :  View  Materialization  Simulation  (vsgxpdp7)  */ 

/*  Author  :  Jesse  T.  South  */ 

/*  Date  :  17  June  1991  */ 

/*  Revised         :  25  July  1991  */ 

/*  Modified        :  for  general  expressions  22  AUG  by  Curtis  Barefield      */ 

/*  Purpose         :  Thesis  Research  */ 

/*  System  :  IBM  80286  clone/  80386SX  */ 

/*  Compiler  :  Microsoft  C  6.0,  INGRES  precompiler . (Borland  C++)  */ 

/*  Description        :  The  program  is  part  of  a  thesis  */ 

tinclude  <stdio.h> 

#include  <stdlib.h> 

#include  <time.h> 

(♦include  <math.h> 

exec  sql  include  sqlca: 

#define  size  16 

#define  dbinfo  "info.dat" 

#define  cntrlfl  "cntrl.dat" 

#define  update_file  "data_in" 

#define  finrslt  "fnlrslt.dat" 

#define  runrslt  "rnrslt.dat" 

exec  sql  begin  declare  section; 

#define  empinfo  "empdat.dat" 

tdefine  posinfo  "posdat.dat" 

#define  skilinfo  "skildat.dat" 

"♦define  updatinfo  "update.dat" 

exec  sql  end  declare  section; 

void  open_files(FILE**.  FILE**.  FILE**): 

void  close_files(FILE**.  FILE**,  FILE**): 

void  init_test_database( int) : 

void  scan_dbinfo(long*.  long*,  long*,  int*.  int*.  int*,  long*,  long*,  long*): 

void  create_tables( void) ; 

void  create_viewa(int) : 

void  create_update_table(void) : 

void  copy_base_tables( void) : 

void  copy_semi_n_full_mats( int) ; 

void  create_table_index(void) : 

void  module_qm(char.  int,  long,  double*.  FILE*): 

void  module_stn(char,  int,  long,  double*,  FILE*): 
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void  module_fm(char .  int.  long,  double*.  FILE*): 

void  write_file_headings(char*.  char*.  FILE*.  FILE*): 

void  write_run_result ( char .  char.  int.  long,  double,  long,  FILE*): 

void  wri te_f inal_result ( int .  int.  long.  int.  long,  long.  long,  float, 

float,  float,  float,  float,  double,  double,  double.  FILE*.  FILE*) 
void  compute_avg_time ( int .  double*,  double*,  double*): 
void  compute_fv_and_fq_and_P ( int ,  int.  int.  int.  float*,  long.  long.  long. 

long,  float*,  int.  int.  float*); 
void  compute  table_counts ( long* .  long*,  long*,  long,  float*,  float*): 
void  ref resh_update_text_f ile( long.  long,  long): 
void  main(void) 
{ 

int  K.  0.  updat_siz.  i.  run_cnt  =  0.  zero  *  0: 
int  vmax.  vbase.  vincr.  viewcut: 

long  ecard.  pcard.  scard.  countb,  countv,  countq: 
long  qmax.  qbase,  qincr,  querycut: 
float  fv,  fva.  fq,  fqa.  P: 
double  timeqra.  timesm.  timefm: 
char  QUERY  =  'Q'.  UPDATE  =  'K': 

char  *prm_ptr.  parameter [ 10]  .  *updt_ptr.  updat  rel[10]: 
FILE   *cntrl_fl.  *fresult_fl.  *run_rslt: 
prra_ptr  ■  &parameter [0] : 
updt_ptr  =  &updat_rel [0] : 

open_f iles(&run_rslt .  &cntrl_fl,  &f result_f 1) : 
scan_dbinfo (Spcard.  &ecard.  iscard.  &vmax,  Svbase.  Svincr.  Sqmax.  Sqbase. 

fiqincr) : 
while(!feof (cntrl_fl) ) 
{ 

timeqm  -  timesm  »  timefm  *  0.0: 
countb  »  countv  »  countq  »  0: 
fscanf (cntrl_fl.  "%d   *ld  %d  %d   %d  %b   %a" .    Sviewcut.  Squerycut.  &K.  &Q. 

&updat_siz,  prm_ptr,  updt_ptr) : 
if  (run_cnt  "    zero)  write_f ile_headings (prm_ptr .  updt_ptr.  fresult_fl. 

run_rslt ) : 
init_test_databa«e( viewcut) ; 
run_cnt*+; 
printf("\n  run  #  *d\n" .  run  cnt) : 
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ford    =    0;    i    <    K:    i++) 

( 

ref resh_update_text_f ilelpcard.    i.    updat_siz): 
module    qm(UPDATE.    viewcut.    querycut.    fitimeqm.     run_rslt) 
module_sm ( UPDATE .    viewcut.    querycut.    fitimesm.     run_rslt) 
module    fm(UPDATE.    viewcut.    querycut.    fitimefm.     run_rsltl 
) 
ford    =   0:    i    <    Q:    i++) 

( 

module_qro(QUERY.    viewcut.    querycut.    fitimeqm,     run_rslt); 
module_sm (QUERY,    viewcut.    querycut.    fitimesm.    run_rslt); 
module_fm (QUERY,    viewcut.    querycut.    fitimefm.     run_rslt); 
) 
compute_avg_time  (Q.    fitimeqm.    fitimesm.    fitimefm); 
compute_fv_and_f q_and_P( vraax.    vbase.    vincr.    viewcut.    fifv,    qmax.    qbase. 

qincr.    querycut,    fifq.    K.    Q,    fiP); 
compute_table_counts (ficountb.    ficountv.    ficountq,    querycut.    fifva,    fifqa) : 
write_f inal_result ( run_cnt .    viewcut.    querycut.    updat_siz.    countb.    countv. 
countq,     fv.    fva.     fq.    fqa.    P.     timeqm.    timesm.     timefm. 
fresult_fl.    run_rslt): 
exec   sql    disconnect: 
system ( " rmingres" ) : 
) 
close_f iles(firun_rslt .    &cntrl_fl.    &f result_f 1) : 
printf ( " \ndisconnect   complete\n" ) : 
} 
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void  ini t_test_database ( int  viewcut) 
( 

system! "destroydb  magdi"); 
system ( "createdb  magdi"); 
system! "addingres  -B  -D64000"): 
exec  sql  whenever  sqlerror  stop: 
exec  sql  connect  magdi; 
create_tables( ) ; 
create  views ( viewcut ) ; 
copy_base_tables ( ) ; 
copy_semi_n_full_mats (viewcut ) : 
create_table_index( ) ; 
} 
void  open_files(FILE  **run_rslt,  FILE  **cntrl_fl.  FILE  **f result_f 1 ) 
{ 

*cntrl_fl  ■  fopen(cntrlfl.  "r"): 
*fresult_fl  =  fopen( f inrslt .  "a"): 
*run_rslt  =  f open( runrslt .  "a"); 
if ( ( !*run_rslt)  ||  (!*cntrl_fl)  !|  ( !*fresult_f 1) ) 

{ 

printf ( "\nERROR:  control  or  output  files  did  not  open"): 

fclosealK)  : 

exec  sql  disconnect; 

exit(l) : 

} 
} 
void  close_files(FILE  **run_rslt.  FILE  **cntrl_fl.  FILE  **f result_f 1 ) 
{ 

int  i: 

f printf (*fresult_fl. "\n" ) ; 

for  (i=0:i<80:i++)  fprintf (*f result_f 1. "•" )  : 
fclose  (*run_rslt); 
f close  (*cntrl_fl): 
fclose  (•fresult_fl) ; 
} 
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void  scan_dbinfo ( long*  ecard.  long*  pcard.  long*  scard,  int*  vmax.  int*  vbase. 
int*  vincr.  long*  qmax.  long*  qbase.  long*  qincr) 
{ 

FILE*  db_info; 

db_info  ■  f open(dbinfo.  "r"); 
if ( :db_info) 
{ 

printf ("\NERROR:  dbinfo  file  did  not  open  "): 
fclosealK)  : 
exec  sql  disconnect; 
exit(l) ; 
) 
fscanf (db_info.  "%ld  %ld  %ld\n" .  &*ecard.  S*pcard.  &*scard) : 
fscanf (db_info.  "%d   %d   %d\n" .  &*vmax.  &*vbase,  &*vincr) : 
fscanf  (db_info.  "*ld  %ld  %ld"  .  &*qmax.  &*qbase.  &*qincr) : 
f close (db_info) ; 
} 
void  create_tables( ) 

< 
/*  create  query  modification  tables  */ 
exec  sql  create  table  posqm 

(e_num  integer2.  snum  integer2.  level  integerl.  keyno  integer2, 
accinfo  c86) : 
exec  sql  create  table  empqm 

(e_num  integer2.  dnum  integer2.  ename  c20.  address  c70. 
salary  integer4.  title  c30.  Jobdesc  c60) : 
exec  sql  create  table  skillqn 

(snum  integer2.  sname  c20.  stype  c34): 
/*  create  semi-materialization  tables  */ 
exec  sql  create  table  possm 

(e_num  integer2.  snum  integer2.  level  integerl.  keyno  integer2. 
accinfo  c86) : 
exec  sql  create  table  empsm 

(e_num  integer2.  dnum  integer2.  ename  c20.  address  c70. 
salary  integer4.  title  c30.  jobdesc  c60); 
exec  sql  create  table  skillsm 

(snum  integer2.  sname  c20.  stype  c34): 
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exec  sql  create  table  pos  prim 

(e  num  integer2.  keyno  integer2): 
exec  sql  create  table  emp  prim 

(e_num  integer2,  ename  c20.  salary  integer4): 
/*  create  full  materialization  tables  */ 
exec  sql  create  table  posfm 

(e_num  integer2.  snum  integer2.  level  integerl.  keyno  integer2. 
accinfo  c86) : 
exec  sql  create  table  empfm 

(e  num  integer2.  dnum  integer2.  ename  c20.  address  c70. 
salary  integer4.  title  c30.  jobdesc  c60) ; 
exec  sql  create  table  skillfm 

(snum  integer2.  sname  c20.  stype  c34): 
exec  sql  create  table  full_raat 

(e_num  integer2.  ename  c20.  salary  integer4.  keyno  integer2): 
> 
void  create_views ( int  viewcut) 
{ 

exec  sql  begin  declare  section: 

int  view_cut; 
exec  sql  end  declare  section: 
view_cut  =  viewcut: 

exec  sal  create  view  full_view(e_num.  ename.  salary,  kevno )  as 

select  empgm , e_num .  empqm . ename ,  empqm. salary.  outer2. kevno 

from  empqm.  posqm  outer2 

where  exists 

(select  « 


from  DOBam  lnner2 

where  empqm . e_num  »  inner2.e_num 

and  outer2. kevno  -  lnner2 .kevno 

and  inner 2 . level  >°  :vlew  cut); 
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exec  sal  create  view  sm_view(e_num.  ename.  salary,  kevno)  as 

select  emp_prim,e_num.  emp_prim. ename .  erop_prim. salary .  outer4.kevno 

from  emp_prim.  pos_prim  outer4 

where  exists 

(select  * 


from  Pos_prim  inner4 

where  emp_prim. e_num  =  inner4 . e_num 

and  outer4.keyno  =    inner4 .kevno) : 

) 

void  create_update_table ( ) 
( 
exec  sql  create  table  update_tbl 

(e_num  integer2.  snum  integer2.  level  integerl.  keyno  integer2. 
accinf o  c86) ; 
exec  sql  copy  table  update  tbl 

(e_num  ■  cOcolon.  snum=  cOcolon,  level  =  cOcolon. 
keyno  *  cOcolon.  accinf o  =  cOnl) 
from  :updatinfo: 
> 
void  copy_base_tables( ) 
{ 
exec  sql  copy  table  posqm 

(e_num  =  cOcolon.  snum  ■  cOcolon.  level  =  cOcolon.  keyno  =  cOcolon. 
accinf o  ■  cOnl ) 
from  :posinfo: 
exec  sql  copy  table  possm 

(e_num  ■  cOcolon.  snum  =  cOcolon.  level  »  cOcolon.  keyno  =  cOcolon. 
accinf o  ■  cOnl) 
from  :posinfo: 
exec  sql  copy  table  posfm 

(e_num  »  cOcolon.  snum  *  cOcolon.  level  »  cOcolon,  keyno  ■  cOcolon. 
accinf o  »  cOnl) 
from  :posinfo: 
exec  sql  copy  table  empqm 

(e_num  *  cOcolon.  dnum  »  cOcolon.  ename  ■  cOcolon.  address  =  cOcolon. 
salary  »  cOcolon.  title  «  cOcolon,  Jobdesc  ■  cOnl ) 
from  :empinfo: 
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exec  sql  copy  table  empsm 

(e  nun  =  cOcolon,  dnum  ■  cOcolon.  ename  =  cOcolon.  address  ■  cOcolon. 
salary  =  cOcolon.  title  =  cOcolon.  jobdesc  ■  cOnl) 
from  :empinfo: 
exec  sql  copy  table  empfm 

(e_num  =  cOcolon.  dnum  =  cOcolon.  ename  »  cOcolon.  address  •  cOcolon. 
salary  =  cOcolon.  title  =  cOcolon.  jobdesc  ■  cOnl ) 
from  :empinfo; 
exec  sql  copy  table  skillqm 

( snum  =  cOcolon.  sname  ■  cOcolon,  stype  =  cOnl ) 
from  tskilinfo; 
exec  sql  copy  table  skillsm 

(snum  «  cOcolon,  sname  =  cOcolon.  stype  ■  cOnl ) 
from  :skilinfo: 
exec  sql  copy  table  skillfm 

(snum  ■  cOcolon.  sname  ■  cOcolon.  stype  ■  cOnl ) 
from  :skilinfo: 
} 
void  copy_semi_n_full_mats( int  viewcut) 
{ 

exec  sql  begin  declare  section: 
int  view_cut: 

exec  sql  end  declare  section: 
view_cut  =  viewcut: 

exec  sql  insert  into  pos_prim  (e_num.  keyno) 
select  e_num.  keyno 
from  possm 

where  level  >■  :view_cut; 
exec  sql  insert  into  emp_prim  (e_num,  ename.  salary) 
■elect  e_num,  ename,  salary 
from  empsm: 
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exec  sal  Insert  into  ful l_mat  fg_num.  ename .  salary,  kevno) 

select  empfm.e_num   gmpfm. ename .  empfm. salary .  outer2,kevno 

from  empfm.  posfm  outer2 

where  exists 

(select  * 


from  posfm  inner2 

where  empfm . e_num  =    inner2.e_num 

and  outer2.kevno  =  inner2.kevno 

and  inner2. level  >=  :view_cut): 

> 
void  create_table_index( ) 

< 

exec  sql  modify  erapqm  to  cbtree  on  e_num: 
exec  sql  modify  empsm  to  cbtree  on  e_num: 
exec  sql  modify  empfm  to  cbtree  on  e_num; 
exec  sql  modify  posqm  to  cbtree  on  level; 
exec  sql  modify  possm  to  cbtree  on  level; 
exec  sql  modify  posfm  to  cbtree  on  level; 
exec  sql  modify  emp_prim  to  cbtree  on  salary; 
exec  sql  modify  pos_prim  to  cbtree  on  e_num: 
exec  sql  modify  full_mat  to  cbtree  on  salary; 

/*   create  secondary  indexes   */ 
exec  sql  create  index  empqmdx 

on  empqm  (e_num): 
exec  sql  create  index  empsmdx 

on  empsm  (e_num) ; 
exec  sql  create  index  empfmdx 

on  empfm  (e_num): 
exec  sql  create  index  posqmdx 

on  posqm  (level): 
exec  sql  create  index  possmdx 

on  possm  ( level) : 
exec  sql  create  index  posfmdx 

on  posfm  (level): 
exec  sql  create  index  e_priodx 

on  emp_prim  (salary); 
exec  sql  create  index  p_primdx 

on  pos  prim(e  num); 
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exec  sql  create  index  f_matdx 
on  full_mat  (salary): 
) 
void  module_qm(char  cntrl_char.  int  viewcut.  long  querycut.  double  *timeqm. 
FILE  *run_rslt) 
{ 

clock_t  tstart  -    0.  tstop  =  0: 
double  elap_time: 
long  tbl_cnt  =  0: 
exec  sql  begin  declare  section: 
int  view_cut; 
long  query_cut: 
long  qnum: 
char  qname [21] ; 
long  qkeyno: 
exec  sql  end  declare  section: 
exec  sql  declare  qn>  cl  cursor  for 
select  e_num.  ename.  keyno 
from  full_view 
where  salary  >=  :query_cut: 
view_cut  ■  viewcut: 
query_cut  =  querycut: 
switch(cntrl_char) 
{ 
case  ' K' : 

create_update_table ( ) : 
exec  sql  insert  into  posqm 
select  * 
from  update_tbl: 
exec  sql  drop  update_tbl; 
break: 
case  'Q' : 

tstart  »  clock( ) : 

exec  sql  open  qm_cl : 

exec  sql  whenever  not  found  goto  closeqm_cl: 
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while( sqlca. sqlcode  ==  0) 
{ 
exec  sql  fetch  qm_cl 

into  :qnum.  :qname,  :qkeyno: 
/*   printf ( " \nnumber  =  %d" .    qnum):  */ 
tbl_cnt**; 
} 
closeqm_cl : 

exec  sql  whenever  not  found  continue: 
tstop  =  clock( ) ; 
exec  sql  close  qm_cl: 
break: 
default: 

printf ( "\nIncorrect  control  character\n" ) ; 
break: 
> 
elap_time  =  (tstop  -  tstart ) / (double)CLK_TCK: 
•timegm  =  *timeqm  +  elap_time: 
write_run_result ( 'q' .  cntrl_char.  viewcut.  querycut.  elap_time.  tbl_cnt. 

run_rslt) ; 
) 
void  module_sm(char  cntrl_char.  int  viewcut.  long  querycut.  double  *timesm. 
FILE  *run_rslt) 
{ 

clock_t  tstart  =  0.  tstop  =  0: 
double  elap_time: 
long  tbl_cnt  =  0: 
exec  sql  begin  declare  section; 
int  view_cut: 
long  query _cut: 
long  snum: 
char  sname[21]; 
long  s key no; 
exec  sql  end  declare  section: 
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exec  sql  declare  sm_cl  cursor  for 
select  e_num.  ename.  keyno 
from  sm_view 

where  salary  >=  :query_cut: 
view  cut  ■  viewcut: 
query_cut  =  querycut: 
swi tch(cntrl_char ) 
{ 
case  ' K' : 

create_update_table ( )  : 
exec  sql  insert  into  possra 
select  * 
from  update_tbl; 
tstart  =  clock( ) : 
exec  sql  insert  into  pos_prim 
select  e  num.  keyno 
from  update_tbl 
where  level  >=  :view_cut: 
tstop  =  clockf ) : 
exec  sql  drop  update_tbl: 
break: 
case  ' Q' : 

tstart  =  clockf ) ; 
exec  sql  open  sm_cl: 

exec  sql  whenever  not  found  goto  closesm_cl : 
while  ( sqlca. sqlcode  ==  0) 
{ 
exec  sql  fetch  sm_cl 

into  :snum.  : snane .  :skeyno; 
/*   printf ("\nsnum  «  %d" .    snum):  */ 
tbl_cnt++: 
> 
closesm_cl : 

exec  sql  whenever  not  found  continue: 
tstop  =  clock( ) : 
exec  sql  close  sm_cl : 
break: 
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default: 

printf ( "\Nincorrect  control  character\n" ) ; 
break: 

} 
elap_time  =  (tstop  -  tstart )/ (double ) CLK_TCK: 
*timesm  =  *timesm  *  elap_time; 
write_run_resul t ( ' s ' .  cntrl_char.  viewcut.  querycut.  elap_time.  tbl_cnt. 

run_rslt) : 
I 
void  raodule_f m (char  cntrl_char.  int  viewcut.  long  querycut.  double  *timefm. 
FILE  *run_rslt) 
{ 

clock_t  tstart  =  0.  tstop  =  0: 
double  elap_time: 
long  qcnt  =  0; 
exec  sql  begin  declare  section; 

int  view_cut: 

long  query_cut: 

long  tbl_cnt: 

long  fnum: 

char  fname [21] : 

long  fkeyno: 
exec  sql  end  declare  section: 
exec  sql  declare  fm_cl  cursor  for 
select  e_num.  ename.  keyno 

from  full_mat 

where  salary  >=  :guery_cut: 
view_cut  =  viewcut: 
query_cut  -  querycut: 
switch(cntrl_char) 

{ 
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case  ' K' : 

create_update_table ( ) ; 
exec  sql  insert  into  posfm 
select  * 
from  update_tbl: 
tstart  =  clock( ) ; 

exec  sal  drop  fulljnat: 

exec  sal  create  table  fulljnat 

(ejium  inteaer2.  enaroe  c20.  salary  inteaer4.  kevno  inteaer2): 

exec  sal  insert  into  fulljnat  (ejium.  ename.  salary,  keyno) 

select  ercpfm.ejium.  empfm.ename.  empfm. salary,  outer. kevno 

from  empfm.  posfm  outer 

where  exists 

(select  * 
from  posfm  inner 

where   empfm.ejium   =    inner.ejium 

and  outer .kevno  =    inner .kevno 

and  inner. level  >=  :viewjrutl: 

exec  sal  modify  fulljnat  to  cbtree  on  salary: 

tstop  ■  clock( ) : 
exec  sql  drop  update _tbl; 
break: 
case  'Q' : 

tstart  =  clock! ) : 
exec  sql  open  fin  cl: 

exec  sql  whenever  not  found  goto  closefm  cl: 
while  (sqlca. sqlcode  ■■  0) 
( 
exec  sql  fetch  fmjrl 

into  :fnum.  :fname.  :f keyno: 
/*   printf("\n  fnum  «  *d" .  fnum) :  */ 
qcnt++; 
} 
closefm  cl : 

exec  sql  whenever  not  found  continue; 
tstop  ■  clock ( ) : 
exec  sql  close  fmjrl; 
break: 
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default: 

printf ( "\Nincorrect  control  character\n" ) : 
break; 
) 
elap_time  =  (tstop  -  tstart) / (double) CLK_TCK; 
*timefm  *  *timefm  ♦  elap_time; 
exec  sql  select  rowtot  =  count(e_num) 
into  : tbl_cnt 
from  full_mat 

where  salary  >=  :query_cut; 
write  run_result ( ' f '  ,  cntrl_char.  viewcut.  querycut,  elap_time.  tbl_cnt. 

run_rslt ) : 
> 
void  write  f ile_headings (char*  param,  char*  updt_tbl.  FILE*  fresult_fl. 
FILE*  run_rslt) 
( 

time  t  today_t; 
time(&today_t) : 

fprintf (fresult_fl. "\n  %s      -      FINAL  RESULTS  (vsgxpdp7)  -\n" .  ctime(&today_t ) 
fprintf ( f result_f 1 . "\n  The  %e    is  the  parameter  being  tested",  param): 
fprintf (fresult_fl. "\n  The  %s   table  is  the  table  being  updated".  updt_tbl): 
fprintf (run_rslt."\n  %s      -   RUN  RESULTS  (vsgxpdp7)  -\n".  ctime(&today_t) ) : 
fprintf (run_rslt. "\n  The  %s    is  the  parameter  being  tested",  param): 
fprintf ( run_rslt. "\n  The  %s   table  is  the  table  being  updated\n" .  updt_tbl): 
} 
void  write_run_result (char  strat.  char  cntrl_char,  int  viewcut.  long  querycut. 
double  elap_time.  long  tbl_cnt,  FILE  *run_rslt) 
{ 
printf ( "\n*cm  cc«*c  vc**d  qc=*ld  et»%.21f  tc*%ld" ,  strat.  cntrl_char. 

viewcut.  querycut.  elap_time.  tbl_cnt): 
fprintf (run_rslt. "\n*cm  cc«*c  vc=%d  qc=*ld  et=*.21f  tc=%ld" . strat . 

cntrl_char.  viewcut.  querycut.  elap_time.  tbl_cnt): 
) 
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void  wri te_f inal_resul t ( int  run.  int  viewcut.  long  querycut.  int  updt_siz. 
long  countb.  long  countv.  long  countq.  float  fv. 
float  fva.  float  fq.  float  fqa.  float  P. 
double  timeqm,  double  timesm.  double  timefm. 
FILE  *fresult_fl.  FILE  *run_rslt) 
( 

printf ("\n\nRUN#  %d.    VCUT=  *d.  QCUT=  %ld.  #TUP=  %d.    BASE=  %ld.  VIEW=  %ld."\ 
"  QUERY=  %ld"  .  run.  viewcut.  querycut.  updt_siz.  countb.  countv, 
countq) ; 
printf ("\nFV=  *.2f.  FVA=  %f .    FQ=  %.2f.  FQA=  %f    P=    %.2f".fv.  fva.  fq. 

fqa.  P): 
printf ("\nTIMEQM=  %.31f  sec.  TIMESM=  *.31f  sec.  TIMEFM=  %.31f  sec\n" . 

timeqm.  timesm.  timefm): 
fprintf (fresult_fl."\n\NRUN#  *d.  VCUT=  %d.    QCUT=  *ld.  #TUP  =  %d.    BASE  =  %ld."\ 
"  VIEW=  %ld.  QUERY=  %ld"  .  run.  viewcut,  querycut.  updt_siz.  countb. 
countv,  countq); 
fprintf (fresult_fl. "\NFV=  %.2f.  FVA=  %f .    FQ=  %.2f.  FQA=  Sf  P  =  %.2f".fv.  fva. 

fq.  fqa.  P)  ; 
fprintf (fresult_fl."\NTIMEQM=  %.31f  sec.  TIMESM=  %.31f  sec.  TIMEFM=  %.31f\ 

"  sec\n" . timeqm.  timesm.  timefm): 
fprintf (run_rslt."\n\NRUN#  %d.    VCUT*  %d.  QCUT=  %ld.  #TUP=  %d.  BASE=  *ld."\ 
"  VIEW=  %ld.  QUERY=  %ld"  ,  run.  viewcut.  querycut,  updt_siz,  countb. 
countv.  countq) : 
fprintf (run_rslt."\NFV=  %.2f.  FVA=  %f .    FQ*  %-2f.  FQA*  %f    P  =  *.2f",  fv.  fva. 

fq.  fqa.  P): 
fprintf (run_rslt. "\NTIMEQM=  *.31f  sec.  TIMESM*  *.31f  sec.  TIMEFM'  %.31f"\ 

"  sec\n" . timeqm.  timesm.  timefm); 
> 
void  compute  avg  time (int  Q,  double  'timeqm,  double  *timesm,  double  *timefm) 


if(Q  >  0) 
< 

♦timeqm  »  *timeqm  /  (double)Q 
•timesm  ■  *timesm  /  (double)Q 
•timefm  =  *timefm  /  (double)Q 
} 
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else 
{ 
printf ("\n\NERROR:  dividing  times  by  0.   ****  results  are  VOID  ****\n" ) 

> 
> 
void  compute_f v_and_f q_and_P( int  vmax.  int  vbase.  int  vincr,  int  vcut. 

float  *fv.long  qmax.  long  qbase.  long  qincr. 
long  qcut.  float  *fq.  int  K.  int  Q,  float  *P) 

{ 

*fv  =  (float) (vmax)  -  (( float ) (vcut  -  vbase)  /  ( float ) (vincr )) : 
*fv  =  (*fv  +  (float) (vincr)  /  ( float ) (vincr ) )  /  ( float ) (vmax) : 
*fq  =  (float) (qmax)  -  (( float ) (qcut  -  qbase)  /  ( float ) (qincr )) : 
*fq  =  (*fq  ♦  (float) (qincr)  /  ( float ) (qincr) )  /  ( float ) (qmax) : 
*P   =  (float) (K)/(float) (K  *  Q)  ; 
> 
void  compute  table  counts) long  *countb.  long  *countv.  long  *countq. 

long  querycut.  float  *fva.  float  *fqa) 
( 

exec  sql  begin  declare  section; 
1 ong  que  ry_cut : 
long  tbl_cnt: 
exec  sql  end  declare  section: 
query_cut  ■  querycut: 
exec  sql  create  table  base_mat 

(e_num  integer2.  ename  c20.  salary  integer4.  keyno  integer2): 
exec  sql  insert  into  base_mat  (e_num.  ename.  salary,  keyno) 

select  empfm.e_num.  empfm. ename.  empfm. salary,  posfm. keyno 
from  empfm.  posfm 
where  empfm. e_num  «  posfm. e_num; 
exec  sql  select  rowtot  »  count (e_num) 
into  : tbl_cnt 
from  base_mat; 
•countb  *  tbl_cnt: 

exec  sql  select  rowtot  =  count (e_num) 
into  :tbl_cnt 
from  full_mat: 
♦countv  =  tbl  cnt: 
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exec  sql  select  rowtot  =  count (e  num) 
into  : tbl_cnt 
from  full_mat 

where  salary  >=  :query_cut: 
•countq  =  tbl_cnt; 

*fva  =  ( float )(( double ) *countv  /  ( double ) *countb) ; 
*fqa  =  ( float )(( double) *countq  /  (double ) *countv)  : 
exec  sql  drop  base_mat; 
) 
void  ref resh_update_text_f ile ( long  card,  long  i.  long  update_siz) 
< 

long  update_base: 

int  num_of_f ields.  J.  change_field  =  4; 
char  f ile_name[size]  =  updatinfo,  *file_ptr: 
FILE  *updat_fl: 

struct  field_attrib 
( 

char   field_type: 
int    field_width: 
char   field_info: 
long   lower_bound: 
int    increment: 
long   upper_bound; 
struct        field_attrib   *next: 
>: 
struct  field_attrib  *first_field  =  NULL: 
struct  field_attrib  *current_f ield  *  NULL; 
struct  field_attrib  *print_ptr  *  NULL: 
file_ptr  •   &file_name[0] : 
update_base  ■  (i  *  update_siz)  ♦  card  +1:   /*  compute  new  key  base  number  */ 
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/**   Read  old  control  input  for  data  generation  program   **/ 
updat_fl  =  fopen(update_f ile.  "r")i 
if ( !updat_f 1) 
{ 

printf ( "\NERROR :  update  control  file  did  not  open  to  read"): 
fclosealK  )  : 
exec  sql  disconnect; 
exit(l) : 
} 
fscanf (updat_fl.  "%*d\n"): 
f scanf (updat_f 1 .  "%d\n".  &num_of_f ields) : 
fscanf (updat_fl.  "%*s\n"); 
for  (j  =  1:  j  <■  num_of_f ields:  J++) 
{ 

if  (j  -=  1) 
{ 

first_field  =  (struct  field_attrib*)malloc(sizeof (struct  field_attrib) ) : 
if  (first_field  «  NULL)  printf ( "\NERROR:  Memory  did  not  allocate!!!"): 
current_f ield  =  first_field; 
} 
else 
( 

current_field->next  = (struct  f ield_attrib*)malloc(sizeof (struct  f ield_attrib) ) ; 
current_f ield  =  current_f ield- >next : 
} 
current_f ield->next  *  NULL; 

fscanf (updat_fl.  "\n%c\n".  &current_f ield-> f ield_type)  : 
fscanf (updat_fl.  "%d\n".  &current_f ield->f ield_width) : 
fscanf (updat_fl.  "%c\n".  &current_f ield->f ield_info)  : 
fscanf (updat_fl,  "%ld\n".  &current_f ield->lower_bound) : 
fscanf (updat_fl,  "%d\n".  &current_f ield- > increment ) : 
f scanf (updat_fl,  "*ld\n",  Scurrent_f ield->upper_bound) : 
if  (J  "  change_f ield)       /*  changing  base  for  keyno  field  */ 
( 

current_f ield-> lower_bound  *  update_base; 
} 
) 
f close (updat  fl): 
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/**   write  updated  control  input  for  data  generation  program   **/ 
updat_fl  -    fopen(update_f ile.  "w" ) : 
if ( : updat_f 1 ) 
( 

printf ( "\NERROR:  update  control  file  did  not  open  to  write"): 
fclosealK  )  : 
exec  sql  disconnect; 
exit(l) : 
} 
fprintf (updat_fl.  "%ld\n".  update_siz): 
fprintf (updat_fl.  "*d\n".  num_of_f ields)  : 
fprintf (updat_fl.  "%s".  file_ptr)  : 
print_ptr  ■  first_field: 
while(print_ptr  !=  NULL) 
< 

"\n\n*c\n" .  print_ptr- > f ield_type) : 
"%d\n" .  print_ptr->field_width) ; 
"%c\n".  print_ptr->field_info) : 
"%ld\n".  print  ptr-> lower_bound) : 
"%d\n".  print_ptr->increment) : 
"SSld".  print  ptr->upper  bound); 


fprintf (updat_f 1 . 

fprintf (updat_f 1. 

fprintf (updat_fl. 

fprintf (updat_fl, 

fprintf (updat_fl. 

fprintf (updat_f 1 , 

print_ptr  =  print_ptr->next : 

) 
fclose (updat_f 1 ) ; 
system ( "datagen" ) ; 
> 
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